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Description 

[0001] The present invention relates to the management and distribution of electronic media particularly, although 
not exclusively, over a network. 

s [0002] With the advent of the computer and partlcularty the networking of computers, the ability of organisations and 
individuals to rapidly generate, store, access and process data has Increased dramatically. In the case of many organ- 
isations, the ability to manage and leverage data has become a central aspect of their business. 
[0003] Not surprisingly, considerable effort and development has occurred in those computational and software fields 
related to the generation, storage, accessibility and processing of data. Nevertheless, it has been the case that as 

10 organisations have moved to a distributed architecture paralleling the development of the Internet, the complexity 
involved in providing solutions across different platforms and operating systems has become ever more challenging. 
Consequently, developers have tended to concentrate on limited solutions for preferred platforms and operating sys- 
tems. Slmilarty, organisations have sought to standardise the tools they use to leverage data. 
[0004] Unfortunately, the pull exerted by those distributed computing models currently finding favour is in direct con- 

is tradiction to the solutions adopted by the majority of developers and those responsible within organisations for the 
selection of tools. Consequently, the management and distribution of data, particular of high value media content re- 
mains problematic. 

[0005] Thus, according to one aspect of the present invention, there is provided a query resolution system comprising 
one or more archives containing a plurality of persistent data entities, each entity including metadata in the form of a 

20 group of properties having property values assignable thereto, at least some of those properties providing a definition 
of a predetemrtined level of scope such that within a set of related data entities, the scope of an entity at a higher level 
encompasses the scope of related entities at a lower level of scope, a registry databeise operable to extract from said 
one or more archives those data entities having predetermined properties including said definition of a predetennined 
level of scope and a query resolution engine operable in response to a request from a query Interface to Identify 

2s extracted entities whose property values fulfil the request. 

[0006] Advantageously, some, at least, of the property values are dynamically generated for Inclusion In the registry 
database. Furthermore, the system may extract entites from more than one archive in order to generate a so-called 
search space on which the query resolution engine operates. Furthermore, because the entities extracted from the 
archives have different scope, it is possible for the query resolution engine to identify entitles of a specific scope In 

30 response to a request. Conveniently, the relevance of those entities identified as fulfilling the request may be scored 
in accordance with a predetennined algorithm to indicate their potential relevance. Such scoring may be generated on 
a scale of zero to one hundred with zero indicative of no relevance and one hundred indicative of a complete match. 
Other such scoring approaches may, of course, be utilised. 

[0007] According to a further aspect of the Invention, there is provided a query resolution sendee for use in an object- 
's oriented programming environment including one or more archhres containing a plurality of persistent data entities, 
each entity Including metadata In the form of a group of properties having property values assignable thereto, at least 
some of those properties providing a definition of a predetermined level of scope such that within a set of related data 
entities, the scope of an entity at a higher level encompasses the scope of related entities, at a lower level of scope, 
the service comprising extracting from said one or more archives those entities having predetermined properties in- 
40 eluding said definition of a predetermined level of scope and Identifying, In response to a request, those extracted 
entities whose property values fulfil said request. 

[0008] Preferably, the service includes within the functionality of the interface the ability to respond to human rather 
than agent generated requests. Such functionality may be provided in a further interface suited to a particular environ- 
ment. Thus a web based Interface such as a Common Gateway Interface (CGI) may be used. The CGI or other suitable 
45 web-based component converts a user request Into the appropriate f omnat. Similarly, the response Is converted Into 
a f omi typically HTML which may be rendered on a device for viewing by the user. 

[0009] According to a still further aspect of the invention, there is provided a registration database for connection to 
one or more archives containing a plurality of persistent data entities, each entity including metadata in the form of a 
group of properties having property values assignable thereto, at least some of those properties providing a definition 
so of a predetermined level of scope such that within a set of related data entities, the scope of an entity at a higher level 
encompasses the scope of related entitles at a lower level of scope, the database being operable to extract from said 
one or more archives those data entities having predetermined properties including said definition of a predetermined 
level of scope. 

[0010] Such a database may be implemented on a networtc or stand-alone basis. In the former case, the networic 
ss may be fixed and/or mobile in composition. 

[0011] Thus, according to a further aspect of the invention, there Is provided a temnlnal for connection to a registration 
database, said database being connected to one or more archives containing a plurality of persistent data entities, 
each entity Including metadata In the form of a group of properties having property values assignable thereto, at least 
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some of those properties providing a definition of a predetermined level of scope such that within a set of related data 
entities, the scope of an entity at a higher level encompasses the scope of related entitles at a tower level of scope, 
the database being operable to extract from said one or more archives those data entities having predetermined prop- 
erties Including said definition of a predetermined level of scope, the temnlnai being operable In response to user Input 
5 to generate a request to identify extracted entitles whose property values are defined in said input. 

[0012] in order to understand the present invention more fully, a number of embodiments thereof will now be described 
by way of example and with reference to the accompanying drawings, in which: 

Figure 1 Is a block diagram of a network operating in accordance with a framework of the invention; 
10 Figure 2 Is a schematte diagram Illustrating the components of the framework of Figure 1 ; 

Figure 3 is a block diagram of an Identity architecture of the framework of Figure 1 and 
Figure 4 is a block diagram of a registry service of the framework of figure 1 . 

[0013] It should be noted that In order to improve the readability of the specif k^on, portions of the description 
15 relating to the embodiment described below have been Included as Appendbes I to V. Where appropriate, reference 
has been made to a relevant Appendix. It wilt, of course, be understood by those skilled in that art that that the Appen- 
dices are Intended to form part of the present disclosure. 

[0014] A network 1 includes an HTTP web server 3 accessible 4 by production clients 6 operating a number of 
operating systems on various platform and a set of on-line distribution clients 7. Included amongst the on-line distribution 

20 clients 7 is a wireless terminal 9 utilising Wireless Mari<-up Language (WML). As such, the terminal accesses 6 the 
HTTP web server 3 Indirectly via a WAP server 1 , which provides the necessary translation 8 between HTTP and WML, 
The HTTP web server 3 further provides a Common Gateway Interface (CGI). In addition to these phystoal elements 
of the network 1 , data exchanged with the HTTP web server 1 is also exchangeable 10 with an Agent pool 13 made 
up of a number of core software components or agents 13a. 13b, 13c, 13d providing servtees which will be elaborated 

25 upon below. Data exchanged 10 with the HTTP web server 3 by the Agent pool 13 may be transferred 12 between 
agents 1 23b, 1 3c. The Agent pool 1 3 has additional connections. Firstly, a connection 1 4 to a customer documentation 
server 1 5 capable of providing both on-line 1 7 and hard media 1 9 access to users and secondly, a connection 1 6 to a 
set of one or more archives 21 whteh themselves may be monitored and managed through an on-line connection 18. 
to a remote terminal 23. 

30 [001 5] Figure 2. illustrates on a conceptual level the relationships (indk^ated by arrows in the Figure) existing between 
elements of the embodiment. Thus, a Media Attributions and Reference Semantk:s (MARS) 25 provides a core standard 
vocabulary and semantics utilising metadata for facilitating the portable management, referencing, distribution , storage 
and retrieval of electronic media. As will be further described below, MARS 25 Is the common language by which 
different elements of the embodiment communteate. A Generalised Media Archive (GMA) 27 provides an abstract 

35 archival model for the storage and management of data based solely on metadata detined by MARS 25. At a physical 
level, a Portable Media Archive (PMA) 29 provides an organisational model of a file system based data repository 
conforming to and suitable for implementations of the Generalised Media Archive (GMA) abstract archival model. 
Finally, a Registry Service Architecture (REGS) 31 is provided whteh pennlts dynamic qudry resolution by agencies 
including users and software components or agents utilising MARS 25, thereby providing a unified Interface model for 

40 a broad range of search and retrieval tools. 

[0016] As has been previously indk:ated, the Framework is based on a web server 3 running on a platfonm which 
provides basic command line and standard Input/output stream functionality. An agent 13 provides two interfaces: a 
combined Hypertext Transfer Protocol (HTTP) and Common Gateway Interface (CGI), HTTP+CGI, and a Portable 
Operating Sys*tem Interface (POSIX) command line + standard input/output/en-or. In addition to these interfaces, the 

^ agent may provide further interfaces based on Java method invocation and/or Common Object Request Broker Archi- 
tecture (CORBA) method Invocation. An agent (or other user, client, or process) is free to choose among the available 
interfaces with whteh to communk»te including communication with another such agent 1 3. In addition, the framework 
altows non-agent systems, processes , tools . or servtees which are utilised by an agent 1 3 to be accessed via proprietary 
means if necessary or useful for any operations or processes outside of the scope of the architecture. Thus, tools and 

50 servk^s Intended for the architecture can co-exist freely with other tools and services utilising the same resources. 
[0017] In more detail, the protocols on which the framework Is based Include HTTP which Is an application-level 
protocol for distributed, collaborative, hypermedia information systems. As a generic, stateless, protocol HTTP can be 
used for many tasks t>eyond hypertext. Thus, It may also be used with name servers and distributed object management 
systems, through extension of Its request methods, error codes and headers. A particulariy useful feature of HTTP is 

55 the typing and negotiation of data representation, allowing systems to be built independently of the data being trans- 
ferred. 

[0018] CGI Is a standard for Interfacing external applications with infonrnatlon servers, such as Web servers. CGI 
serves as the primary communication mechanism between networked clients and sofhware agents within the frame- 
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work. 

[0019] POSIX is a set of standard operating system Interfaces based on the UNIX operating system. The POSIX 
Interfaces were developed under the auspices of the IEEE (Institute of Electrical and Electronics Engineers). The 
framework adopts the POSIX models for convnand line arguments, standard Input streams, standard output streams, 

5 and standard error streams. 

[0020] CORBA specifies a system which provides Interoperability between objects In a heterogeneous, distributed 
environment that is transparent to a database programmer. Its design Is based on the Object Management Group 
(OMG) Object Model. Framework agents may utilise CORBA as one of several means of agent Intercommunication. 
[0021] Java (Registered Trade Mark) is both a programming language and a platform. Java Is a high-level program- 

10 ming language intended to be architecture-neutral, object-oriented, portable, distributed, high-performance, interpret- 
ed, multithreaded, robust, dynamic, and secure. The Java platform is a "virtual machine" which is able to run any Java 
program on any machine for which an impiementatlon of the Java virtuai machine (JVM) exists. Most operating systems 
commonly in use today are able to support an implementation of the JVM. The core software components and agents 
provided by the framework may be Implemented in Java. 

IS [0022] Metadata is held within the framework using a naming scheme whteh Is compatible across a broad range of 
encoding schemes, including, but not limited to the following programming, scripting and command languages: 
[0023] C, C++. Objective C, Java, Visual BASIC, Ada. Smalltalk, LISP, Emacs Lisp, Scheme, Prolog, JavaScript/ 
ECMAScript, Perl. Python, TCL. Bourne Shell, C Shell, 2 Shell. Bash, Kom Shell. POSIX, Win32, REXX, SQL. 
[0024] The naming scheme is also compatible with but not limited to the following mari<-up and typesetting L^nguag- 

20 es: 

[0025] SGML, XML, HTML, XHTML. DSSSL, CSS, PostScript, PDF. 

[0026] Equally, the naming scheme is also compatible with but not limited to the following file systems: 
[0027] FAT (MS-DOS), VFAT (Windows 95/98). NTFS (Windows NT/2000), HFS (Macintosh), HPFS (OS/2), HP/UX, 
UFS (Solaris), ext2 (Linux), ODS-2 (VMS), NFS. ISO 9660 (CDROM), UDF (CDR/W. DVD). 
25 [0028] In order to provide such compatibility, the naming scheme utilises an explicit, bound, and typically ordinal set 
of values referred to hereinafter as a token. The token may comprise any sequence of characters beginning with a 
lowercase alphabets character followed by zero or more k)wercase alphanumeric characters with optional single In- 
tervening underscore characters. More specifically, any string matching the following POSIX regular expression: 

30 

/[a-zlL?ta-z0-9DV 

Examples: 
35 [0029] 

abed 
ab_cd 

40 

a123 

x2J3_4_5 

4S hereJs_a_veryJong_token_value 



[0030] By defining MARS metadata properties in a token format, an agent 13 or other tool Is able to operate more 
efficiently as a result of its processes being based on controlled sets of explicitly defined values rather than those based 
on arbitrary values. 

so [0031 ] A token provides the structure through which the framework is able to define metadata in the fbmi of a property, 
this property being representative of a quality or attribute assigned or related to an telentlfiable body of infonnatlon. 
The property thus connprises an ordered collection of one or more values sharing a common name. The name of the 
property represents the name of the collection and the value(s) represent the realisation of that property. In accordance 
with the token structure adopted in the framework, constraints are placed on the values which may serve as the real- 
ms isation of a given property. A property set is thus any set of MARS 25 properties. Further details of the property types 
allowed under MARS 25 are to be found in Appendix II? Certain property values are also defined under MARS 25 and 
may also be found in Appendix II? These Include the property value of count which may be single meaning that at most 
there may be one value for a given property or multiple meaning that there may be one or more values for a given 
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property. Another property value is range which for any given property may be bounded or unbounded. In addition, the 
property value of ranking provides, for any given property, the set of allowed values for that property may be ordered 
by an Implicit or explicit ordinal ranking, either presumed by all applteations operating on or referencing those values 
or defined. Some property value types are ranked Implk^itly due to ttieir type and subsequently the value ranges of all 
5 properties of such types are automatically ranked examples of such property types include Integer, Count, Date, Time 
and the like. Most properties vvith ranked value ranges are token types having a controlled set of allowed values which 
have a signlfk^nt sequential ordering such as status, release, milestone and the like. 

[0032] Ranking, If it is applied, may be either strict or partial. With strict ranking, no two values for a given property 
may share the same ranking. With partial ranking, multiple values may share the same rank, or may be unspecified 

10 for rank, having the implteit default rank of zero. 

[0033] Ranked properties may only have single values. This is a special constraint which follows logically from the 
fact that ranking defines a relationship between objects having ranked values, and comparisons between ranked values 
becomes potentially ambiguous If multiple values are allowed. E.g. if the values x, y, and z for property P have the 
ranking 1,2, and 3 respectively, and object foo' has the property P(y) and object 'bar* has the property P(x,z), then a 

IS. boolean query such as "foo.P < bar.P?" cannot be resolved to a single boolean result, as y Is both less than z and 
greater than x, and thus the query is both true and false, depending on whteh value is chosen for bar.P (i.e. foo.P(y) 
< bar.P(x) = False, while foo.P(y) < barP(z) = True). 

[0034] Ranking for ail property types other than token are defined implicitly by the data type, usually confonning to 
fundamental mathematical or industry standard conventions. Ranking for token property values are specified using 
20 Ranking. In either case and as has already been stated, ranking may be strict In the sense that the set of allowed 
values for the given property corresponds to a strict ordering, and each value is associated with a unique ranking within 
that ordering. Alternatively, ranking may be partial in the sense that the set of allowed values for the given property 
corresponds to a partial ordering, and each value Is associated with a ranking within that ordering, defaulting to zero 
if not otherwise specified. Finally, ranking may not be applied such that the set of allowed values for the given property 
con-esponds to a free ordering, and any ranking specified for any value is disregarded. 

[0035] With reference to Figure 3, the frameworic defines an Identity architecture 33 having a set of nested pre- 
detennined definitions of specific scope each utilising tokens to hold information. At the towest level of scope, a Storage 
Item 35 corresponds to what would typk^ally be stored in a single file or database record, and is the physk:al represen- 
tation of the data which the framework Is capable of manipulating. Thus, Items 35 are the discrete computational objects 
30 which are passed from process to process, and which form thie building blocks from which the infomiation space and 
the environment used to manage, navigate, and manipulate it are formed. Hence, an Item 35 may embody content, 
content fragments, metadata, revision deltas, or other infomiation. 

[0036] At the next highest level of scope, a Media Component 37 defines a particular realisatton of a defined token 
value. Thus, the Component 37 defines at an abstract level properties and characteristics of one of the following non- 
35 exhaustive contenttypes, namely data, metadata, table of contents, Index or glossary. A data content type might include 
a language, area of coverage, release or method of encoding. A component 37 is linked to one or more storage Item 
35 whteh relates to the content at a physical level. 

[0037] Immediately, above the level of scope of the Media Component 37 is a Media Instance 39. The media instance 
39 Is made up of a number of media components 37 each of which relate to a particular property of an identifiable body 
40 of infonnatlon. Thus, a particular Media Instance 39 will comprise a set of properties 37 namely a specify release, 
language, area of coverage and encoding method. 

[0038] Finally, the highest level of scope is a Media Object 41 whfeh represents em body of Infonnatlon corresponding 
to a common organisational concept such as a document, book, manual, chapter, section, sidebar, table, image, chart, 
diagram, graph, photograph, video segment, audio stream or the like. However, the body of information is abstract to 
45 the extent that no specification Is made of any particular language, coverage, encoding or indeed release. Thus, de- 
pending on the presence, or otherwise of Infomiation at the lower levels of scope, dictated ultimately by the existence 
or otherwise of a relevant Storage Item 35, it may be possible to realise some, if not all, partk^ular media Instances 39 
con^espondlng to that media object 41 . 

[0039] In order to allow for referencing of specifto content, namely a fragment within a given item, component, in- 
50 stance, or object, MARS 25 adopts the Woridwlde Web Consortium (W3C) proposal for the XPolnter standard for 
encoding such content specific references in SGML. HTML, or XML content. A fragment will be understood by those 
skilled in the art to be an identifiable linear sub-sequence of the data content of a component 37. either static or 
reproducible, whk:h is normally provided where the full content is either too large In volume for a particular application 
or not specifically relevant. Those skilled in the art will also be aware of the W3C X^jointer proposal, however further 
55 details may be found from the W3C website which is presently located at www.w3c.org. XF^ointer is based on the XML 
Path Language (XPath). Through the selection of various properties, such as element types, attribute values, character 
content, and relative position, XPolnter supports addressing within internal structures of XML documents and allows 
for traversals of a document tree. Thus, in place of structural references to data, the frameworic may provide that explk^it 
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element ID values are used for all pointer references thereby avoiding specific references to structural paths and data 
content. As a result, the framework ensures the maximal validity of pointer values to all realisations of a given media 
object, irrespective of language, coverage, encoding, or partitioning. In addition to the Xpointer standard proposal, 
other altemative/addltional Internal pointer mechanisms for other encodings may be utilised, 
[0040] in addition to the above-described architecture, the framework provides rules which relate to the Inheritance 
and versloning of the eooped deflnitione. Thus, the framework provides that metadata defined at higher scopes is 
Inherited by lower scopes. This Is provkled for by ensuring that two rules are applied. Firstly, that alt metadata properties 
defined In higher scopes are fully visible, applicable, and meaningful In all lower scopes, without exception. Secondly, 
any property defined in a lower scope completely supplants any definition of the same property that might exist in a 
higher scope. Consequently, ail metadata properties defined for a media object 41 are inherited by all Instances 39 of 
that object; and ail metadata properties defined for a media Instance 39 or media object 41 are inherited by all of its 
components 37. 

[0041] In relation to versloning. MARS 25 defines a versloning model using two levels of distinction. A first level Is 
defined as a release, namely a published version of a media instance whk:h is maintained and/or distributed in parallel 
to other releases. By way of example, a release could be viewed as a branch In a prior art tree based versloning model. 
A secorid level is defined as a revision corresponding to a milestone In the editorial lifecycle of a given release; or by 
way of example, a node on a branch of the prior art tree based model. MARS 25 defines and maintains versionlng for 
'data' storage item 35, only. 

[0042] In addition to the Identity architecture described above, MARS 26 provides a management architecture which 
20 pemnits control of processes such as retrieval, storage, and version management. Details of the properties defined to 
provide such functionality might be found in Appendix II MARS 26 also provides affiliation properties which define an 
organisational environment or scope where data is con-ected and maintained. Examples of such properties can also 
be found in Appendix II. MARS 25 further provides content properties which allow definitton of data characteristics 
independent of the production, application or realisation of that Data. Again, examples of such properties can be found 
25 in Appendix 11. MARS 25 also provides encoding properties defining special qualities relating to the fomriat. structure 
or general serialisation of data streams. These properties are, of course, of significance to tools and processes oper- 
ating on that data. Yet again, examples of such properties can be found in Appendix 11. MARS 25 also provides asso- 
ciation properties which define relationships relating to the origin, scope or focus of the content in relation to other data. 
Examples of such properties may be found in Appendix 11. Finally, MARS 25 provides role properties which specify 
one or more actors who have a relationship with the data. An actor may be a real user or a software application such 
as an agent. Examples of such properties may be found in Appendix II. 
[0043] As has been previously mentioned, a Generalised Media Archive (GMA) 27, based on Media Attributton and 
Reference Semanttes (MARS) 25 metadata provides a unifomn, consistent, and implementation independent model 
for the storage, retrieval, versloning. and access control of electron^ media. Further details of the GMA may t^e found 
55 In Appendix IV. The GMA 27 and serves as the common archival model for all managed media objects controlled, 
accessed, transferred or othenwise manipulated by agencies operating with the f rameworic. Hence, the GMA 27 may 
serve as a functional interface to wide range of archive implementations whilst remaining independent of operating 
system, file system, repository organisation, versloning mechanisms, or other implementation details. This abstraction 
facilitates the creation of tools, processes, and methodologies based on this generic model and interface whteh are 
40 insulated from the intemals of the GMA 27 compliant repositories with which they interact. 

[0044] The GMA 27 defines specific behaviour for basic storage and retrieval, access control based on user identity, 
versionlng, automated generation of variant instances, and event processing. The identity of individual storage Items 
35 is based on MARS metadata semantics and ail interaction between a client and a GMA impiementatfon must be 
expressed as^ARS 25 metadata property sets. 

[0045] The GMA manages media objects 41 via media components 37 and is made up of storage Items 35. The 
GMA manages the operations of versionlng, storage, retrieval, access control, generation and events as will be further 
described below. Examples of pseudocode corresponding to the above and other managed operations carried out by 
the GMA may be found In Appendix IV. 

[0046] The GMA 27 operates solely on the basis of MARS 25 metadata and as a result of its operation the GMA 27 
acts on that same metadata. The metadata operated on by the GMA 27 is restricted to management metadata rattier 
than content metadata. The fomner being metadata concerned with the history of the physteal data, such as retrieval 
and modification history, creation history, modification and revision status, whereas the latter is concerned with the 
qualities and characteristics of the Information content as a whole. Independent of Its management. Content metadata 
is stored as a separate 'meta* component 37, not a 'meta' item 35, such that the actual specification of the content 
metadata is managed by tiie GMA 27 Just as any otiier media component 37. The metadata that is of primary concem 
to a GMA 27, and which a GMA accesses, updates, and stores persistently, Is the metadata associated with each 
component 37. 

[0047] A GMA 27 manages media components 37, and tiie management metadata for each media component 37 
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i8 stored persistently in the *meta' storage item 35 of the media component 37. A special case exists with regards to 
management metadata which might be defined at the media instance 39 or media object 41 scope, where that metadata 
Is Inherited by all sub-components 37 of the higher 8cope(s) In accordance with the inheritance rules set out above. 
[0046] In order to provide the necessary functionality, the GMA 27 requires that the certain metadata properties are 

5 defined In an input query and/or in respect of any target data depending on the action being performed and which 
functional units are implemented. These properties are set out in Appendix iV Section 4.1 .2-4. In accordance with 
Inheritance rules defined in MARS 25. retrieval of metadata for a given media component scope Includes all Inherited 
metadata from media object and media Instance scopes. In addition, the GMA 27 will assume the default values as 
defined by the MARS 25 specification for all properties which it requires but that are not specified epiicitiy. It is an 

10 en'or for a required property to have neither a default MARS 25 value nor an explicitly specified value. In addition to 
relying on existing metadata definitions, the GMA 27 is responsible for defining, updating, and maintaining the man- 
agement metadata relevant for the *data* Item 35 of each media component 37, which is stored persistently as the 
'meta' item 35 of the component 37. 

[0049] The GMA 27 stores 'meta* item 35, containing management metadata, in any internal format; however the 
IS GMA must accept and netum *meta' storage Items as XML (extensible Mark-up language) instances. However, content 
metadata constituting the data content of a 'meta* component 37 and stored as the *data' Item 35 of the 'meta* component 
37, must always be a valid XML instance. 

[0050] These two constraints ensure that an agent interacting with the GMA 27 is able to retrieve from or store to 
the GMA 27 both content and management metadata as needed. The GMA 27 is also able, as a consequence of these 

20 constraints to resolve Inherited management metadata from meta components at higher scopes In a generic fashion. 
[0051] In order to store and retrieve Items, the GMA 27 associates electronic rtiedla data streams to MARS 25 storage 
item identities and malces persistent, retrievable copies of those data streams indexed by their MARS 25 Identity. The 
GMA 27 also manages the corresponding creation and modification of time stamps in relation to those Items. The GMA 
27 organises both the repository 21 of storage items 35 as well as the mapping mechanisms relating MARS identity ^ 

25 metadata to locations within that repository 21 . The GMA27 may be Implemented In any particular technology including, 
but not limited to common relational or object oriented database technology, direct file system storage, or any number 
of custom and/or proprietary technologies. 

[0052] In addition to the core storage and retrieval actions provided by the GMA 27. the GMA 27 is capable of [ 
providing the functionality necessary to permit operations by agents In relation to versionlng, access control, generation, 

30 and/or events. To the extent that such functionality Is provided by the GMA 27, it will exhibit a pre-defined behaviour 
[0053] Thus, if the GMA 27 implements access control, then access control of media components 37 is based on 
several controlling criteria as defined for the environment In which the GMA resides and as stored In the metadata of ^ 
Individual components managed by the GMA. Access control Is defined for entire components and never for Individual 
items within a component. Access control may also be defined for media objects 41 and media instances 39, in which 

35 case subordinate media components 37 inherit the access configuration from the higher scope(s) in the case that It is 
not defined specifically for the component. The four controlling criteria for media access are User Identity, Group mem- 
bership(s) of user, Read pemiission for user or group and Write pemiission for user or group. 
[0054] Accordingly, every user must have a unique identifier within the environment in which the GMA operates, and 
the permissions must be defined according to the set of all users and groups within that environment 

40 [0055] A user may be a human, but also can be a software application, process, or system typically referred to as 
an agent 13. This is especially important for both licensing as well as tracking operations perfomied on data by auto- 
mated software agents 13 operating within the GMA 27 environment. Furthennore, any user may belong to one or 
more groups, and permissions may be defined for an entire group, and thus for every member of that group. Conse- 
quently, the maintenance overhead in environnfients with large numbers of users and/or high user turnover many users 

45 coming and going is reduced. In a manner similar to the inheritance rules applied by MARS 25, permissions defined 
for exptksit user override pemiisslons defined for a group of which the user is a member. For example. If a group is 
alkywed write penmlssion to acomponent 37, but a partteuiar user is expltoitly denied write pemiission forthat component 
37. then the user may not modify the component 37. 

[0056] The GMA 27 may also provides read permission such that a user or group may retrieve a copy of the data. 
so Where a lock maricer Is placed In relation to data, it does not prohibit retrieval of data, merely modification of that data. 
[0057] If access control Is not implemented, and/or unless othenvise specified globally for the GMA 27 environment 
or for a partk^ular archive, or explicitly defined in the metadata for any relevant scope, a GMA 27 must assume that all 
users have read pemiission to alt content. 

[0056] Similariy, the GMA 27 may also provide Write penmlssion whteh means that the user or group may modify the 
S5 data by storing a new version thereof. 

[0O59] The GMA 27 provMes that write pemriission equates to read pemiission such that every user or group whteh 
has write permission to particular content also has read permission. This ovenldes the situation where the user or 
group is otherwise explicitly denied read permission. 
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[0060] As in the case of read permission, the presence of a lock marker prohibits modification by any user other than 
the owner of the lock, Including the owner of the component 32 If the lock owner and component owner are different. 
Optionally, the GMA 27 provide a means to defeat locking as a reserved action unavailable to general users. Should 
locking be defeated In this manner then the GMA 27 logs the event and notifies the lock owner accordingly 
5 [0061] Where access control is not implemented, then the GMA27 applies the rule that all users have write permission 
to all content. If access control is implemented, and unless othenw ise specified globally for the GMA 27 environment 
or for a parttoular archive or explteidy defined in the metadata for any relevant scope, the GMA 27 must assume that 
no users have write permission to any content 

[0062] Regardless of any other metadata defined access specifications not including settings defined globally for 
10 the archive, the owner of a component 37 always has write access to that component 32. 

[0063] In addition to blanket access control, the GMA 27 may, if access control is enabled provide a set of access 
levels which serve as convenience terms when defining, specifying, or discussing the •*functlonal mode" of a particular 
GMA 27 with regard to read and write access control. 

[0064] Access levels can be used as configuration values by GMA 27 implementations to specify global access 
IS behaviour for a given GMA 27 where the impiementatton is capable of providing multiple access levels. At each level 
the read and write capability may be predefined subject to the overriding rule that a read right may never fall below the 
corresponding write right. 

[0065] The GMA 27 may Implement versloning. Through the implementation of versloning, the GMA 22 facilitates 
the identification, preservation, and retrieval of partfcular revisions in the editorial lifecyde of a particular discrete body 
20 of data. 

[0066] The versloning model used by the GMA 22 and further description in Appendix IV section 4.5, In particular 
defines a release as a series of separately managed and independently accessible sequences of revisions. Revisions 
are defined as 'snapshots' along a parttoular release. Where a release is derived from another release then the GMA 
27 updates a MARS 25 source property to identify from what release and revision the new release stems. Within the 

25 above rules, the GMA 27 is responsible for linear sequence of revisions within a particular release. The GMA 27 is 
responsive to extemal agent 1 3 activities that are themselves responsible forthe automated or semi-automated creation 
or specification of new instances 39 relating to distinct releases. The GMA is also responsive to agent 13 activity relating 
to the retrieval of revisions not unique to a particular release. Typically, the creation of new releases will be performed 
manually by a human editor. Including the specification of 'source' and any other relevant metadata values. Other tools, 

30 extemal to the GMA 27 may also exist to aid users in performing such operations. 

[0067] Versloning is performed by a GMA 27 for the 'data' item 35 of a media component 37 only and that sequence 
of revisions constitutes the editorial history of the data content of the media component 37. The GMA 22 Is also re- 
sponsible for general management and updating of creation, modification and other time stamp metadata. Storage or 
update of items other than the 'data' item 35 neither effect the status of management metadata stored in the 'meta' 

35 Item 35 of the component 37 unless the Item 35 in question is In fact the 'meta' 35 item of the component 37, nor are 
reflected in the revision history of the component 37. If a revision history or partteular metadata must be maintained 
for any MARS 25 identifiable body of content, then that content must be identified and managed as a separate media 
component 37, possibly belonging to a separate media instance 39. 

[0068] Revisions are identified by positive Integer values utilising MARS 26 property type Count values. The scope 

40 of each media component 37 is unique and revision values have signifteance only within the scope of each particular 
media component 32. Revision sequences should begin with the value '1' and proceed lineariy and sequentially. The 
GMA 27 bitplementation Is free to internally organise and store past revisions in any fashion it chooses.. 
[0069] The GMA 22 may implement one or both of the following described methods for storing past revisions of the 
content of a media component However, regardless of its intemal organlsatton and operations, the GMA 22 must 

45 return any requested revision as a complete copy. 

[0070] One method whteh the GMA 27 may employ to store past revisions is to generate snapshots. A snapshot is 
a complete copy of a given revision at a partteular point In time. As such snapshotting Is straightforward to Implement, 
and possibly time consuming regeneration operations are not needed to retrieve past revisions. The latter can be very 
important in an environment where there is heavy usage and retrieval times are a concern. 

50 [0071] Alternatively or in conjunction with snapshots, the GMA 27 may store past revisions through a reverse delta 
methodology. A delta Is set of one or more editorial operations which can be applied to a body of data to consistently 
derive another body of data. A reverse delta is a delta whbh allows one to derive a previous revision from a fomier 
revision. Rather than store the complete and total content of each revlston, the GMA 27 stores the modlfteatlons nec- 
essary to derive each past revision from the immediately succeeding later reviston. To obtain a specific past revision, 

55 the GMA 27 begins at the current revision, and then applies the reverse deltas in sequence for each previous revision 
until the desired revision is reached. 

[0072] in a variant of the above, the GMA 27 utilises a fonward delta methodology where each delta defines the 
operations needed to derive the more recent revision from the preceding revision. 
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[0073] The GM A 27 may also Implement generation through the dynamically creating datastreams from one or more 
existing storage Items 35. By way of example, this includes conversions from one encoding or fomnat to another, 
extraction of portions of a components content, auto-generation of indices, tables of contents, bibliographies, glossa- 
ries, and the like as new components 37 of a media instance 39. generation of usage, history, and/or dependency 

5 reports based on metadata values, generation of metadata profiles for use by one or more registry services. 

[0074] The GM A 27 also provides dynamic partitioning whereby a fragment of the data content is returned in place 
of the entire 'data' item, optionally including automatically generated hypertext links to preceding and succeeding con- 
tent, and/or Infomiation al)OUt the structural/contextual qualities of the omitted content, depending on the media en- 
coding. Dynamic partitioning may be Implemented by the GMA 27 In-espectlve of whether static fragments exist. Dy- 

10 namte partitioning Is controlled by one or possibly two metadata properties, in addition to those defining the identity of 
the source data Item. The required property Is size which detemiines the m€uclmum number of bytes which the fragment 
can contain starting at the beginning of the data Item. Whereas the second and optional property Is pointer whteh 
defines the point within the data item from which the fragment is extracted. Thus, the GIV1A 27 extracts the requested 
fragment, starting either at the beginning of the data item,, where no pointer is defined or at the point specified by the 

IS pointer value which may be at the start of the data item if the pointer value is zero. The GMA 27 collects the largest 
coherent and meaningful sequence of content up to but not exceeding the specified number of content bytes. What 
constitutes a coherent and meaningful sequence will depend on the media encoding of the data and possibly interpre- 
tations Inherent in the GMA 27 implementation Itself. 

[0075] A GMA 27 may Implement event handling. Accordingly, for each storage item 35, media component 37, media 
20 instance 39, or media object 41, a set of one or more MARS 26 property sets defining some operation(s) can be 
associated with each MARS 25 action, such that when that action Is successfully perfomied on that Item 35, component 
37, instance 41 , or object, the associated operations are executed. Automated operations are thus defined for the 
source data and not for any target data which might be automatically generated as a result of an event triggered 
operation. 

25 [0076] Each operation property set must specify the necessary metadata properties to be executed correctly, such 
as the action(s) to perform and possibly Including the CGI URL of the agency which Is to perform the action. The GMA 
27 detemiines how a given operation is to be performed, and by whteh software component or agent 13 if otherwise,, 
unspecified in the property set(s). 

[0077] in the case of a remove action, which will result in the removal of any events defined at the same scope as. 
30 the removed data, the GMA 27 will execute any operations associated with the remove action defined at that scope, 
after successful removal of the data, even though the operations themselves are part of the data removed and will, 
never be executed again In that context. 

[0078] The most common type of operation for events is a compound 'generate store' action which generates a new 
target item from an input item and stores it persistently in the GMA 27, taking into account ail versioning and access 

35 controls in force. By this operation. It Is possible to automattealty update components such as the toe (Table of Contents) 
or index when a data component 37 is modified, or generate statte fragments of an updated data component 37. 
[0079] The GMA 27 may associate automated operations globally for any given action provided the automated op- 
erations are defined in tenns of MARS 25 property sets. Automated operation may also be applied wiUiin tiie scope 
of the data being acted u|3on. The GMA 25 may also associate automated operations with triggers other than MARS 

40 25 actions, such as reoccurring times or days of the week, for the purpose of removing expired data such as via a 
'locate remove' compound action 

[0080] The GMA 27 must also apply the following rules relating to the serialisation and encoding of certain storage 
Items. Thus, the GMA 27 provides that every 'meta' storage item whteh is presented to a GMA 27 for storage or returned 
by a GMA 27 on retrieval must be a valid XML instance. Metadata property values "contained" within 'meta* storage 

45 Items 35 need not be stored or managed intemally in the GMA 27 using XML, but every GMA 27 implementation must 
accept and return 'meta' items as vaiM XML instances, in the case of 'data' Storage Items 35 within *meta' Media 
Components 37, the serialisation of 'meta' storage items 35 is also used to encode all 'data' storage items 35 for ail 
'meta* components 37. Although the GMA 27 persistently stores ail 'data' storage items 35 titeralty, it may also choose 
to parse and extract a copy of the metadata property values defined within meta component data items 35 to more 

50 efffciently detennlne Inherited metadata properties at specifte scopes within the archive 27. 

Every 'Idmap' storage Item whteh is presented to a GMA 27 for storage or returned by a GMA 27 on retrieval should 
be encoded as a Comma Separated Value (CSV) data stream defining a table with two columns where each row is a 
single mapping and where the first column/fleld contains the value of the 'pointer' property defining the symbolic ref- 
erence and the second column/fieki contains the value of the fragment* property specifying the data content fragment 

55 containing the target of the reference, for example: 

#E1D284828,228 
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#EID192,12 

#EID9926.3281 

#EiD727.340 

5 

[0081] The mapping information "containecr within 'idmap' storage items need not be stored or managed internally 
in the GMA 27 in CSV fomiat. but every GMA 27 implementation must accept and return 'Idmap* Items as CSV fomiatted 
data streams. 

[0082] Finally, the GMA 27 must retum the complete and valid contents of a given 'data* storage item for a specified 
10 revision (if It exists), regardless how previous revisions are managed Internally. Reverse deltas or other change sum- 
mary infonnatlon which must be applied in some fashion to regenerate or rebuild the desired revision must never be 
retumed by a GMA 27. even if that is all that is stored for each revision data item intemally. Only the complete data 
item is to be retumed. 

[0083] In order to implement the GMA 27 across a physical system 1 , the concept of a Portable Media Archive (PMA) 
29 has already been Introduced. The PMA 29 provides a physical organisational model of a file system based data 
repository 21 conforming to and suitable for implementations of the Generalised Media Archh^e (GMA) 27 abstract 
archival model. Appendix 111 provides further details of the PMA 29 

[0084] The PMA 29 defines an explicit yet highly portable file system organisation for the storage and retrieval of 
infomnation based MARS 36 metadata. Accordingly, the PMA 29 uses the MARS Identity and Item Qualifier metadata 
property values themselves as directory and/or file names. Where the GMA 27 utilises a physical organisation, model 
other than the PMA 29. The PMA 29 may nevertheless be employed by such an Implementation as a data interchange 
format between disparate GMA 27 implementations and/or as a format for storing portable backups of a given archive 
21. 

[0085] The PMA 29 is structured physically as a hierarchical directory tree that follows the MARS object/instance/ 
25 component/item scoping model. Each media object 41 comprises a branch in the directory tree, each media instance 
39 a sub-branch within the object branch 41 . each media component 32 a sub-branch within the instance 39, and so 
forth. Only MARS Identity and Item Qualifier property values are used to reference the media objects 41 and instances 
39. All other metadata properties as well as Identity and Qualifier properties are defined and stored persistently in 
'meta* storage items 35; confonning to the serialisation and interchange encodings used by the GMA 27 and refen-ed 
to above. Because identity and Item Qualifier properties must be either valid MARS tokens or integer values, it will be 
appreciated by more skilled In the art that any such property value is likely to be an acceptable directory or file name 
in all major file systems in use today. 
[0086] More particulariy, the media object scope is encoded as a directory path consisting of a sequence of nested 
directories, one for each character in the media object 'identifier' property value. 
35 [0087] For example: 

kJentifier="dn9982827172" gives d/n/g/9/8/2/8/2/7/1/2/ 

40 [0088] Identifier values are broken up In this fashion In order to support very large numbers of media objects, perhaps 
up to millions or even billions of such objects, residing in a given archive 21 . By employing only one character per 
directory, the PMA 29 ensures that there will be at most 37 child sub-directories within any given directory level that is 
one possible sub-directory for each character in the set [a-zO-9 J allowed in MARS token values. Accordingly, the sub- 
directory structure satisfies the maximum directory children constraints of most modern file systems. The media object 
41 scope may contain media instance 39 sub-scopes or media component 37 sub-scopes; the latter defining infomna- 
tion, metadata or otherwise, which is shared by or relevant to all instances of the media object 41 . The media instance 
39 scope is encoded as a nested directory sub-path within the media object 41 scope and consisting of one directory 
for each of the property values for Yeiease', 'language', 'coverage', and 'encoding', in that order. 
[0089] For example: 

50 

release="1" language=''en'' coverage=''globar encoding=''xhtmr gives 1/en/globai/xhtml/ 

[0090] The media component 37 scope is encoded as a sub-directory within either the media object 41 scope or 
55 media Instance 39 scope and named the sante as the component 37 property value. 
[0091] For example: 



30 
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component^'meta** gives meta/ 

[0092] The revision scope, grouping thie storage Items for a particular revision milestone, Is encoded as a directory 
s sub-path within the media component 37 scope beginning with the literal directory 'revision' followed by a sequence 
of nested directories corresponding to the digits in the non-zero padded revision property value. 
[0093] For example: 

revislon=''27" gives revision/2/7/ 

[0094] The 'data' item 35 for a given revision must be a complete and whole snapshot of the revision, not a partial 
copy or set of deltas to be applied to some other revision or item. It must be fully independent of any other storage 
item insofar as its completeness is concerned, 
f 5 [0095] The fragment scope, grouping the storage items for a particular static fragment of the data component content, 
is encoded as a directory sub-path within the media component 32 scope or revision scope and beginning with the 
literal directory *fragmenf followed by a sequence of nested directories corresponding to the digits in the non-zero 
padded fragment property value. 
[0096] For example: 



20 



so 



55 



f ragments'SCMI " gives f ragment^O/4/1 / 



[0097] The event scope, grouping action triggered operations for a particular component 37, instance 39, or object 
25 41 , is encoded as a directory suk>-path within the media component 32 scope, media instance 39 scope, or media 
object 41 scope and beginning with the literal directory 'events' and containing one or more flies named the same as 
the MARS action property values, each file containing a valid MARS XML instance defining the sequence of operations 
as ordered property sets. 
[0098] For example: 

30 

events/Store 

events/retrieve 

events/unlocic 

35 [0099] The storage item 35 is encoded as a filename within the media component 37, revision, or fragment scope 
and named the same as the Item property value. 
[0100] For example: 

^ item-"data" gives data 

[0101] The PMA 29 does not have any minimum requirements on the capacities of host file systems, nor absolute 
limits on the volume or depth of conforming archives. However, it will be appreciated by those skilled in the art that an 
understanding'of the variables which may affect portability from one fiie system to another is important if data integrity 
45 is to be maintained. Nevertheless, the PMA 29 does define the following recommended minimal constraints on a host 
file system, which should be met, regardless of the total capacity or other capabilities of the fiie system in question: 



File and Directory Name Length j 30 
Directory Depth j 64 

Number of Directory Children | 100 



[0102] The above specified constraints are compatible with the following commonly used file systems, which are 
therefore suitable for hosting an PMA 29 which also does not exceed real constraints of the given host file system: 
[0103] VFAT (Windows 95/98), NTFS (Windows NT/2000), HFS (Macintosh), HPFS (OS/2), HP/UX, UFS (Solaris), 
ext2 (Linux), ISO 9660 Levels 2 and 3 (CDROM), and UDF (CDR/W. DVD). 

[0104] These are but a representative sample of file systems which are suitable for hosting a PMA 29. Appendix Hi 
provides an example of file system organisation for a PMA 29. 
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[0105] Referring now to figure 4, in order to facilitate access by agents to the data held within the framework, a 
ReGistry Service architecture (REGS) 31 is defined which provides for dynamic query resolution agencies based on 
MARS 25. thereby providing a unified interface model for a broad range of search and retrieval tools. Appendix V 
provides further details of REGS. 

5 [01 06] REGS 3 1 provides a generic means to Interact with any number of specialised search and retrieval tools using 
a common set of protocols and interfaces based on the Frameworic utilising MARS metadata semantics and either a 
POSIX or CGI compliant interface. As with other Framework components, this allows for much greater flexibility In the 
implementation and evolution of particular solutions while minimising the interdependencles between the tools and 
their users be they human or software agents 13. 

10 [0107) Being based on MARS 25 metadata allows for a high degree of automation and tight synchronisation with 
the archival and management systems used in the same environment, with each registry service deriving its own 
registry database 43 directly from the metadata stored in and maintained by the various archives 21 themselves; while 
at the same time, each registry service 43 Is insulated f roiin the implementation details of and changes In the archives 
27 from which it receives 44 its information. 

IS [0108] Referring to Figure 4, each variant of REGS 31 shares a common architecture and fundamental t>ehaviour. 
differing only in the actual metadata properties required for its particular application. 

[0109] A key feature of the Registry Database 43 architecture is the provision In every case, of a profile or property 
set which, in addition to any non-Identity related properties, explicitly defines the Identity of a specific media object, 
media instance, media component, or storage item (possibly a qualified data item). 
so [0110] Default values for unspecified Identity properties are not applied to a profile and any given profile may not 
have scope gaps in the defined Identity properties (I.e. *item' defined but not 'component*, etc). Profiles must unam- 
biguously and precisely identify a media object, instance, component or item. 

[0111] In addition to identity, the retrieval location of the archive 21 or other repository where that information resides 
must be specified either using the locatton' or 'agency* properties. If both are specified, they must define the equivalent 
2s locatton. 

[0112] The additional properties included in any given profile are defined by the registry servk^ operating on or 
returning the profile, and may not necessarily contain any additional properties other than those defining Identity and 
location. 

[0113] In order to access the content held within the framework, the agent 1 3 or other user creates a search mask 
30 in the form of a query 46. The query 46 is a particular variant of the above described profile set which defines a set of 
property values which are to be compared to the equivalent properties in one or more profiles. A query differs from a 
regular property set in that it may contain values which may deviate from the MARS 25 specification in the following 
ways: 

[01 14] Properties nomiaily allowing only a single value may have multiple values defined in a query 46. 

35 [01 1 5] The normal interpretation of multiple query values is to apply 'OR' logic such that the property matches if any 
of the query values match any of the target values; however, a given registry service is permitted, depending on the 
application, to apply 'AND' logic requiring that all query values match a target value, and optionally that every target 
value is matched by a query value. Accordingly, it must be dearly specified for a registry secvlce if 'AND' logic is being 
applied to multiple query value sets. Furthermore, query values for properties of MARS type String may contain valid 

40 POSIX regular expressions rather than literal strings; in which case the property matches if the specified regular ex- 
pression pattem matches the target value. Query values may be prefixed by one of several comparison operators, with 
one or more mandatory intervening space characters between the operator and the query value. 
[01 1 6] The order of comparison for binary operators is: 

45 query value {operator} target value 

[0117] Not all comparison operators are necessarily meaningful for all property value types, nor are all operators 
required to be supported by any given registry service. 

[0118] It must be cleariy specified for every registry service which, if any, comparison operators are supported in 
50 input queries. 

[0119] In the rare case that a literal string value begins with a comparison operator followed by one or mo re intervening 
spaces, the initial operator character should be preceded by a backslash character V. The registry servbe must then 
Identify and remove the backslash character before any comparisons. Examples of some comparison operators are 
given below: 

55 

Negation T 

The property matches if the query value fails to match the target value. 
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E.g. "! approved". 
Less Than 

The properly matches if the query value Is less than the target value. 

5 

E.g. -< 2.6". 
Greater Than ■>" 

The property matches if the query value is greater than the target value. 

10 

E.g. ■> draff. 

Less Than or Equal To "<=" 

The property matches if the query value is less than or equal to the target value. 

15 

E.g. ■<= 2000-09-22". 
Greater Than or Equal To 

The property matches if the query value is greater than or equal to the target value. 

20 

E.g. ■>= 5000". 
Wildcard Value Operator 

25 [0120] Any property in a query may have specified for it the special value "*", regardless of property type, which 
effectively matches any defined value In any target. The wildcard value does not however match a property which has 
no value defined for it. 

[0121] The wildcard value operator may be preceded by the negation operator. 

[0122] The special wlldcartl operator Is parllcularty useful for specifying the level of Identity scoping of the returned . 
30 . profiles for a registry 43 which stores profiles for multiple levels of scope. It Is also used to match properties where all 
that is of interest Is that they have some value defined but it does not matter what the value actually is. Alternatively, . 
when combined with the negation operator, to match properties which have no value defined. The latter is useful for 
validation and quality assurance processes to Isolate information which is missing mandatory or critical metadata prop- 
erties. 

35 [0123] In the rare case that a litenat string value equals the wildcard value operator, the wildcard value operator must 
be preceded by a backslash character V. The registry service must then identify and remove the backslash character. . 
before any comparisons. 

[0124] Each variant of REGS 31 has the following connmonality of architecture which ie defined by the metadata 
properties it allows and requires in each profile, the metadata properties It allows and requires in a given search query 
40 and whether returned profiles are scored and ordered according to relevance. These three criteria define the interface 
by which the registry service interacts with all source archh^es and all users. 

[0125] A particular registry service will extract from a gWen archive 27 or be provided by or on behalf of the archive 
the profiles for all targets of interest which a user may search on, and containing all properties defined for each target 
which are relevant to the partteular registry 43. There profiles are stored In the database 43. Depending on the nature 
45 of the registry 43, this may Include profiles for both abstract media objects 41 , media instances, and media components 
37 as well as physk^al storage items 35 or even qualified data items. Some property values for a profile may be dy- 
namk»liy generated specifically for the registry 43, such as the automated identif Nation or extraction of keywords or 
index terms from the data content, or similar operations. 

[0126] The profiles from several archives 21 may be combined by the registry service Into a single search space 43 
so for a given application or environment. The location and/or agency properties serve to differentiate the source locattons 
of the various archives 21 from which the individual profiles originate. 

[0127] All registry services 43 define and search over profiles, and those profiles define bodies of infomnation at 
either an abstract or physical scope; I.e. media objects 41, media instances 39, media components 37, or storage 
items 35. A given registry database might contain profiles for only a single level of scope or for several levels of scope. 
s5 [0128] If a query 46 does not define any Identity properties, then the registry service via a query resolution engine 
45 must return 48 all matching profiles regardless of scope; however, if the query 46 defines one or more Identity 
properties, then all profiles retumed 48 by the engine 45. must be of the same level of scope as the lowest scoped 
Identity property defined In the search query 46. 



13 



it"ULnz^44U3^ mtip://vvww.getthepatent.conVLogin.dog/$exam.support/Fetch/£PU0124^ Page.14of 161 



EP 1 244 032 A1 

[0129] Note that a specific level of scope can be specified in a query 46 by using the special wildcard value for 
the scope of interest (e.g. "component=meta ftem=* ..." to find all storage Items within meta components which other- 
wise match the remainder of the query). 

[0130] Each set of profiles retumed for a given search may be optionally scored and ordered by relevance by the 
5 engine 45, according to how closely they match the input query 46. The score must be retumed as a value to the f^ARS 
•relevance* property. The criteria for detenmlning relevance Is up to each registry service 43, but it must be defined as 
a percentage value where zero Indicates no match whatsoever. 1 00 indicates a "perfect" match {howeverthat Is defined 
by the registry service), and a value between zero and 100 reflects the closeness of the match proporttonally. The 
scale of relevance from zero to 1 00 is expected to be linear. 
10 [0131] A registry service 43 can be directed by a user, or by implementation, to apply two types of thresholds to 
constrain the total number of profiles 48 returned by a given search 46. Both thresholds may be applied together to 
the same search results. The MARS 'size' property can be specified in the search query (or applied implicitly by the 
registry sen/ice) to define the maximum number of profiles to be returned 48. In the case that profiles are scored and 
ordered by relevance, the maximum number of profiles are to be taken from the highest scoring profiles. 
15 [01 32] SImllarty, the MARS 'relevance' property can be specified in the search query 46 (or applied Implicitly by the 
registry service) to define the minimum score which must be equalled or exceeded by every profile retumed. In this 
regard specifying a minimum relevance of 1 00 requires that targets match perfectly, allowing the user or agent to select, 
between best match and absolute match. 

[01 33] All property sets (including profiles and queries) which are received/Imported by and returned/exported from 
20 a registry service via a data stream must be encoded as XML Instances confomiing to the MARS DTD. This includes 
sets of profiles extracted from a given archive 44, search queries received from client applications 46, and sets of 
profiles retumed as the results of a search 48. 

[0134] if multiple property sets are defined in a MARS XML instance provided as a search request 46, then each 
property set Is processed as a separate query 46, and the results of each query 46 retumed 48 in the order specified, 
25 combined In a single XML instance. Any sorting or reduction by specified thresholds is done per each query only 46. 
The results 48 from the separate queries 46 are not combined in any fashion other than concatenated Into the single 
retumed XML instance. 

[0135] Every registry service may organise and manage its intemal registry database 43 using whatever means is 
optimal for that particular service, it Is not required to utilise or preserve any XML encoding of the profiles. 

30 [0136] Most registry services 43 will include an additional CGI or other web based component 47 which provides a 
human-usable interface for a temninal 49 operable fan specifying queries 46 and accessing search results 48. This will 
typically act as a specialised proxy to the general registry service, converting the user specified metadata 50 to a valid 
MARS query 46' and then mapping the retumed XML 48' instance containing the target profiles to HTML 52 for viewing 
and selection. The interface or proxy component 47 preferably provides the following functionality in deih^ering results 

35 to the user. . ^ 

[0137] The set of retumed profiles should be presented as a sequence of links, preserving any ordenng based on 

relevance scoring. 

[0138] Each profile link shouW be encoded as an PQHTML 'a' element within a block element or other visually distinct 
element ('p', 'li', *td', etc.). w ^ ♦u 

40 [0139] The URL value of the 'href attribute of the 'a' element should be constructed from the profile, based on the 
'location' and/or 'agency* properties, which will resolve to the content of (or access interface for) the target, 
[0140] If the 'relevance' property is defined In the profile, its value should begin the content of the 'a' element, differ- 
entiated clearly from subsequent content by punctuation or structure such as parentheses, comma, colon, separate 
table column,.etc. 

45 [01 41] If the title' property Is defined in the profile, its value should complete the content of the 'a' element. Otherwise, 
a (possibly partial) MRN should be constructed from the profile and complete the content of the 'a' element. 
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Examples: 
[0142] 

5 

<html> 
<body> 

10 <P> 

<a hrBf="httpy/xy2.com/GMA?action=retrieve&identifier=.,/>(98) Foo</a> 
</p> 

. 15 <p> 

<a href="http://xy2xom/GMA?action=retrieve&ldentifler=.-.'>(87) Bar</a> 
</p> 

20 <p> 

<a href="httpV/xyz.com/GMA?action=iBtrieve&identifler=../>(37^ Bas</a> 

</p> 

</body> 

</html> 



<html> 
<body> 
<table> 
<tr> 

<th>Score</th> 
<th>Target</lh> 
</tr> 
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<tr> 

<td>98</td> 
<td><a 

href="http://xy2.com/GMA?action=iietrieve&ldentffier=..,">Foo</a></td> 
</tr> 

<tr> 

<td>87<«d> 

<td><ahref="httpy/xy2.com/GMA?acHon=retrieve&ldentifiei^...">Bar</a></td> 
<Ar> 

<tr> 

<:td>37</td> 
•ctdxa 

href="httpy/xyz.com/GMA?actlon=retriev©&ldentifier=...">Bas</a></td> 
</tr> 

</table> 

</body> 

</html> 
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In order to assist still further in understanding this aspect of the invention, a number of different examples of 
REGS 31 suited to particular activities are set out below. In each case, a brief description is provided, as well as a 
sp«5ification of which metadata properties are required or allowed for profiles and for queries. It Is to be noted that the 
action property is required to be specified with the value locate' in all registry service queries, therefore it is not included 
in the required query property specifications for each registiy service. Ukewise, the Yelevance' and "size" properties 
are allowed for all input queries to all registry seivices, therefore they are also not explicitly jisted in the allowed query 
property specifications for each registry service. » «iiowea query 

I0144I IMetadata Registry Service (META-REGS) provides for searching the complete metadata property sets (in- 
cluding inherited values) for all Identifiable bodies of information, concrete or abstract; including media objects media 
instances, media components, storage items and qualified data items. 

[01451 The results of a search are a set of profiles defining zero or more targets at the lowest level of Identity scope 
for which there is a property defined in the search query. All targets in the results will be of the same level of scope 
even if the registry database contains targets at all levels of scope. ' 
[0146] The wildcard operator can be used to force a particular level of scope in the results. E.g. to define media 
mstence scope, only one instance property need be defined with the wildcard operator value (e.g. "Ianguage=") to 
define media component scope, the component property can be defined with the wildcard operator value (e g "com- 
ponent= ); etc. The registry service may not require nor expect that any particular instance property be used nor that 
only orte property be used. It Is not permitted for two or more Instance properties to have both wildcard and negated 
wildcard operator values In a given Input query. 

[0147] The default behaviour Is to provide the best matches for the specified query; however, by defining in the input 
query a value of 100 for the -relevance' property, the search results will only include those targets which match the 
query perfectly. The fomier is most useful for general browsing and exploration of the information space and the latter 
lor collection and extraction of specifically defined data. 

[01481 Required profile properties for META-REGS include all Identity properties required to uniquely idenUV the 
body of infomiation in question, as well as either the location' or 'agency' property. 

[0149] Allowed profile properties for META-REGS include any valid MARS property, in this case being all defined 
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MARS properties applicable to the body of Information in question. It is recomnnended that the 'title' property be defined 
for ail profiles, whenever possible. 

[0150] There are no required query properties for META-REGS although at least one property must be specified in 
the search query other than the 'action* property. 
s [0151] Allowed query properties for META-REGS include any valid MARS property. 

[0152] Content Registry Service (CON-REGS) provides for searching the textual content of all media instances within 
the included archives. It corresponds to a traditional "free-text index" such as those employed by most web sites. 
[0153] The results of a search are a set of profiles defining zero or more data component data storage items or 
qualified data items. 

10 [0154] Profiles are defined only for data storage items and qualified data Items (e.g. fragments) which belong to the 
data component of a media instance. Other components and other items belonging to the data component are not to 
be included in the search space of a CON-REGS registry service. Note that in addition to actual fragment items, profiles 
for "virtual" fragments can be defined using a combination of the 'pointer* and (if needed) *size* properties, where 
appropriate for the media type (e.g. for specific sections of an XML document instance). 

IS [0155] For each data item, the 'keywords' property is defined as the unique, minimal set of index terms for the item, 
typically corresponding to the morphological base forms (linguistic fomis independent of inflection, derivation, or other 
lexical variation) excluding common "stop" words such as articles fthe", "a"), conjunctions ("and", "whereas"), orse- 
manticaily weak words ("is", "said"), etc. It Is expected that the same tools and processes for distilling arbitrary input 
into minimal forms are applied both in the generation of the registry database as well as for all relevant input query 

so values. 

[0156] The scope of the results, such as whole data items versus fragments, can be controlled using the fragment* 
property emd the wildcard value operator "*" for the scope of interest. E.g., "fragment^*" will force the search to only 
retum profiles of matching fragments and not of whole data items; whereas "fragment^l*" will only retum profiles of 
matching whole data storage items, ff otherwise unspecified, all matching profiles for all Items will be returned, whtoh 
ss may result in redundant Infonnation being identified. 

[0157] A human user interface will likely hide the definition of the Iragmenf property behind a more mnemonic se- 
lection list or set of checkboxes, providing a single field of input for the query keywords. 

[0158] If a given value for the 'keywords* property contains multiple words separated by white space, then all of the 
words must occur adjacent to one another in the order specified in the target content. Note that this is not the same 
3o as multiple proper^ values where each value contains a single word. The set of all property values (string set) constitute 
an OR set, while the set of words in a single property value (string) constitute a sequence (phrase) in the target. White 
space sequences in the query property value can be expected to match any white space sequence in the target content,^ 
even if those two sequences are not Mentfeal (i.e. a space can match a newtlne or tab, etc.). 

[0159] A human user interface 47 provides a mechanism for defining multiple 'keywords' property values as well as. 

35 for differentiating between values having a single word and values containing phrases or other white space delimited, 
sequences of words. In the Interest of consistency across registry servk^s, when a single value input field is provided, 
for the 'keywords* or similar property, white space is used to separate multiple values by default and multi-word values 
are spedaliy delimited by quotes to indteate that they constitute the same value (e.g. the field [a b "c1 c2 c3" d] defines 
four values, the third of which has three words). 

40 [0160] It is permitted for special operators or commands to CON-REGS to be Interspersed within the set of 'keywords' 
values, such as those controlling boolean k>gic, maximal or minimal adjacency distances, etc. It is up to the registry 
service to ensure that no ambiguity arises between CON-REGS operators and actual values or between REGS special 
operators and CON-REGS operators. REGS special operators always take precedence over any CON-REGS opera- 
tors. " 

^ [0161] Required CON-REGS profile properties are all Identity and Qualifier properties required to uniquely Identify 
each data storage Item or qualified data item in question; either the 'location' or 'agency' property; and the 'keywords' 
property containing a unique, minimal set of index terms for the item in question. 

[0162] Allowed CON-REGS profile properties are all required properties, as well as the title' property (recommend- 
ed). 

so [0163] Required CON -REGS query properties are the 'keywords' property containing the set of index terms to search 
on which may need to be distilled into a unique, minimal set of base fonns by the registry service. 
[01 64] Allowed CON-REGS query properties are all required properties, as well as the fragment property with either 
wildcard value or negated wildcard value only. 

[01 65] Typological Registry Service (T/PE-REGS) provides for searching the set of 'class' property values (Including 
ss any inherited values) for all media instances according to the typologies defined for the information contained in the 
Included archives. 

[0166] The results of a search are a set of profiles defining zero or more media instances. 

[0167] In addition to the literal matching of property values, such as provided by META-REGS. TYPE-REGS also 
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matches query values to target values taking Into account one or more "IS-A" type hierarchies as defined by the typol- 
ogies employed such that a target value which is an ancestor of a query value also matches (e.g. a query value of 
"dog" would be expected to match a target value of 'animar). If only exact matching is required (such that e.g. "dog" 
only matches "dog") then M ETA-REGS should be used. 

5 [01 68] TYPE-REGS does not differentiate between classification values which belong to different typologies nor for 
any ambiguity which may arise from a single value being associated with multiple typologies with possibly differing 
semantics. It Is only responsible for efficiently locating all media instances which have defined values matching those 
In the Input query. If conflicts arise from the use of multiple typologies within the same environment, it Is recommended 
that separate registry databases be generated and referenced for each Individual typology, 

10 [0169] Required TYPE-REGS profile properties are those Identity properties which explicitly and completely define 
the media Instance, one or more values defined for the *ciass* property, as well as either the location' or 'agency* 
property. 

[01 70] Allowed TYPE-REGS profile properties are all required properties, as well as the title' property (recommend- 
ed). 

« [0171] Required TYPE-REGS query properties are the 'class' property containing the set of classifications to search. 
[0172] Allowed TYPE-REGS query properties are restricted to the 'class' property which Is the only property allowed 
In TYPE- REG search queries. 

[0173] Dependency Registry Service (DEP-REGS) provides for searching the set of Association property values 
(Including any Inherited values) which can be represented explicitly using MARS identity semantics for all bodies of 

20 information in the included archives. 

[0174] The results of a search are a set of profiles defining zero or more targets matching the search query. 
[0175] DEP-REGS is used to identify relationships between bodies of infomiation within a given environment such 
as a document which serves as the basis for a translation to another language or a conversion to an alternate encoding, 
a high level diagram which summarises the basic characteristics of a much more detailed low level diagram or set of 

25 diagrams, a reusable documentation component which sen/es as partial content for a higher level component, etc. 
The ability to detenninesuch reiatlonsh^s, many of which maybe implicit In the data In question, is crucial for managing 
large bodies of information where changes to one media instance may impact the validity or quality of other instances. 
[0176] For example, to locate all targets which Immediately include a given Instance in their content, one would 
constmct a query containing the 'includes' property with a value consisting of a URI identifying the Instance, such as 

30 an MRN. DEP-REGS would then return profiles for all targets which include that instance as a value of their 'includes' 
property. Similarty, to locate ail targets which contain referential links to a given instance, one would construct a query 
containing the 'refers' property with a value Identifying the Instance. 

[0177] DEP-REGS can be seen as a specialised fonm of META-REGS, based only on the minima! set of Identity and 
Association properties. Furthermore, in contrast to the literal matching of property values such as performed by ME- 

35 TA-REGS, DEP-REGS matches Association query values to target values by applying on-the-fly mapping between all 
equwalent URI values when making comparisons; such as between an MRN and an Agency CGI URL, or between 
two non-string-identical Agency CGI URI-s, which both define the same resource (regardless of location). Note that If 
the META-REGS implementation provides such equivalence mapping of URI values, then a separate DEP-REGS 
implementation is not absolutely required; though one may be still employed on the basis of efffciency, given the highly 

40 reduced number of properties in a DEP-REGS profile. 

[0178] Required DEP-REGS profile properties are the identity properties which explicitly and completely define the 
body of infomiation, all defined Association properties, as well as either the 'location* or 'agency* property. 
[01 79] Allowed DEP-REGS profile properties are alt required properties, as well as the title' property (recommended). 
[01 80] Required DEP-REGS query properties are one or more Association properties. 

45 [0181] Allowed DEP-REGS query properties are one or more Association properties. 

[0182] Process Registry Servtee (PRO-REGS) provides for searching over sequences of state or event klentiflers 
(state chains) which are associated with specific components of or locattons within procedural documentation or other 
fomris of temporal Infomiation. 

[0183] The results of a search are a set of profiles defining zero or more targets matching the search query. 

so [0184] PRO-REGS can be used for, among other things, "process sensitive help" where a unique Identifier is asso- 
ciated with each significant point in procedures or operations defined by procedural documentation, and software which 
is monitoring, guiding, and/or managing the procedure keeps a record of the procedural states activated or executed 
by the user. At any time, the running history of executed states can be passed to PRO-REGS as a query to locate 
documentation whteh most closely matches that sequence of states or events, up to the point of the current state, so 

ss that the user receives precise Infonmarion about how to proceed with the given procedure or operation exactly from 
where they are. The procedural documentation would presumably be encoded using some form of functtonal maric-up 
(e.g. SGML, XML, HTML) and generation of the profiles identifying paths to states or steps in the procedural docu- 
mentation would be automatically generated based on analysis of the data content, recursively extracting the paths of 
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special state identifiers embedded in the marlc-up and producing a profile identifying a qualified data item to each 
particular point in the documentation using the •pointer' property. 

[0185] Required PRO-REGS profile properties are the identity properties which explicitly and completely define the 
body of information, the 'class' property defining the sequence of state Identifiers up to the information In question, as 
well as either the location* or 'agenc/ property. 

[0186] Allowed PRO-REGS profile properties are all required properties, as well as the *tltle* property (recommended). 
[01 87] Required PRO-REGS query properties are the 'class' property defining a sequence of state Identifiers based 
on user navigation history. 

[0188] Allowed PRO-REGS query properties are restricted solely to the 'class* property allowed in search queries. 
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1 Scope 

This document defines the Metia Framework for Electronic Media, a generalized metadata 
5 driven framework for Che mana^ment and distribution of electronic media. 

2 Overview 

The Metia Framework defines a set of standard, open and portable models, interfaces, and 
10 protocols facilitating the construction of tools and environments optimized for the 

management, referencing, distribution, storage, and retrieval of electronic media; as well as 
a set of core software components (agents) providing functions and services relating to 
archival, versioning, access control, search, retrieval, conversion, navigation, and metadata 
management. 

'5 The Metia Framework is designed to embody the following qualities and characteristics: 

open 

The framework is based on open standards and proven technologies wherever possible, 
and all framework specific properties and characteristics are fully documented. 

scalable 

Environments based on the framework should function equally well with both few and 
many agents, on a single machine or across a distributed network, and on both small 
and large systems; where performance issues are primarily tied to the properties and 
cs^bilities of the individual agents and/or systems and network bandwidth* and not to 
propezdes of the framework itself. 

modular 

All agents within a given environment interact efiUdently and effectively with one 
another with litde to no specialized configuration and with no special knowledge of die 
in^lementatioa details of particular agents. 

portable 

Agents conforming to die framewoik can be implemented on a broad range of 
platforms using practically any tools, programming languages, or other means. The 
35 core software components provided by the framework itself are implemented in Java, 

providing maximal portability to difiTerent platforms and en^dronments. 

distributed 

Agents are not limited to data or the services of other agents running on the same 
40 machine^ but may interact (often transparendy) with agents running on any machine 

which is accessible over the network. 

reusable 

The framework provides for maximal use and reuse of existing software components 
4S and agents, where more complex agents are implemented using the services of more 

specisdized agents. This allows refinement and extension of processes with little to no 
modification to any existing implementation. 

extensible 

Additional agents may be added to any environment based on the framework with littie 
to no impact to and/or reconfiguration of any existing agents. 
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3 Related Documents, Standards, and Specifications 

5 3,1 Media Attribution and Reference Semantics (MARS) 

Media Attribution and Reference Semantics (MARS), a component of the Metia 
Framework, is a metadata specification framework and core standard vocabulary and 
semantics facilitating the portable management, referencing, distribudon« storage and 
retrieval of electronic media. 

bttp://metia,nokia.com/specSfications/#MARS 

3.2 Greneralized Media Archive (6MA) 

The Generalized Media Archive (GMA). a component of the Metia Framework, defines an 
abstract archival model for the storage and management of data based solely on Media 
Attribution and Reference Semandcs (MARS) metadata; prodding a uniform, consistent, 
and implementation independent model for infonziadon storage and retrieval, versiontng, 
and access control. 

^ lh^://rog|ia,nQW^i<?9P[W^gpe<?ifiCfta9)^^ 

3^ Portable Media Archive (PMA) 

The Portable Media Archive (PMA), a component of the Metia framework, is a physical 
organization model of a file system based data r^ositoiy confbmiing to and suitable for 
implementations of the Generalized Media Archive (OMA) abstract archival model. 

httP'y/metia.nokia.com/specifications/#PMA 

so 3 A Registi^ Service Architecture (REGS) 

The Re^stry Service Architecture (REGS), a component of the Metia Framework, is a 
generic arct^tecture for dynamic query resolution agencies based on the hfetia Framework 
and Media Attribution and Reference Semantics (MARS), providing a unified interface 
model for a broad range of search and retrieval tools, 

3S 

httD'7/metia.nokia.com/SD ecifications/#REGS 

3^ HyperText Transfer Protocol (HTTP) 

40 The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, 

collaborative, hypermedia information systems. It is a generic, stateless, protocol which can 
be used for many tasks beyond its use for hypertext, such as name servers and distributed 
object management systems, through extension of its request methods, error codes and 
headers. A feature of HTTP is the typing and negotiation of data representation, allowing 

^ systems to be built independcntiy of the data being transferred. 

The Metia Framework distributed collaboration model is based primarily on HTTP. 
httD://www. w3.org/Protocols/rfc261 6/rfc26l 6.html 

so 
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3.6 Common Gateway Interface (CGI) 

The Common Gateway Interface (CGI) is a standard for interfacing external applications 
5 with information servers, such as Web servers. Within the new MeCia Framework, CGI will 

serve as the primary communication mechanism between networked clients and software 
agents. 

http://hoohoo.ncsa.uiuc.edu/cgi/overview.html 

10 

3.7 Portable Operaling System Interfiace (POSIX) 

POSDC (Portable Operating System Interface) is a set of standard operating system 
interfaces based on the UNIX operating system. The POSK interfaces were developed 
15 under the auspices of the IEEE (Institute of Electrical and Electroiucs Engineers). 

The Metia Framework adopts the POSDC models for command line arguments, standard 
input streams, standard output streams, and standard error streams. 

http*7/standards.ieee.ory/catalog/oHs/index.html 

20 

3.8 CORBA 

CXDRBA specifies a system which provides interoperability between objects in a 
beterogeaeous, distributed environment and in a way transparent to the progranuner. Its 
design is based on OMG Object Model, 

Metia Framework agents may utilize CORBA as one of several means of agent 
intercommunication. 

httD://www.omg.org/technologv/documents/new f6rmal/corba.htm 

30 

33 Java 

Java is both a programming language and a platform. Java is a higjh-level programming 
language that claims to be simple, architecture-neutral, object-oriented, portable, 
^ distributed, high-perfomiance, interpreted, multithreaded, robust, dynamic, and secure. The 

Java platform is a "virtual machine** which is able to run any Java program on any macliinc 
for which an implementation of the Java virtual madiine (JVM) exists, which is most 
operating systems commonly in use today. 

The core software components and agents provided by the Metia Framework are 
40 implemented in Java. 

http •y^ava.sun .cotn/docs/i ndex.html 
3,10 W3C TR REC-xml: XML (extensible Markup Language) 

45 

The extensible Markup Language PCML) describes a class of data objects called XML 
documents and partially describes the behavior of computer programs which process ^Vn-^m. 
XML is an application profile or restricted form of SGML, the Standard Generalized 
Markup Language. By construction, XML documents are conforming SGML documents. 

50 
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XML is used for the serialization, interchange, and (typically) persistent storage of MARS 
metadata property sets. The Metia Java SDK provides for the importation and exportation 
of MARS XML encoded instances to and from MARS class instances. 

5 

http;/Ayww.w3.oranrRyREC-xml 

3.11 W3C TR rdf-syntax: RDF (Resource Description Framework) 

The Resource Description Framework (RDF) is a foundation for processing juetadata; it 
provides interoperability between ^plications that exchange machine-understandable 
information in a distributed environment 

The Metia Framework uses RDF for defining the semantics of metadata properties. 
http://www,w3,ffrfiay/fiBC-rdf'gYntax/ 

3.12 W3C TR rdf-schema: RDF Schemas 

RDF Schemas provides information about the interpretation of the statements given in an 
RDF data model and may be used to specify constraints that should be followed by these 
data models. 

The Metia Framework uses RDF Schemas for relating metadata properties and values a to 
disjunct but synonymous vocabularies such as Nokia Metadata for Documents and the 
Dublin Core. 

25 

http://www.w3.org/TR/rdf-schema/ 
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4 Key Terms and Concepts 

5 4,1 Agent 

An agent is a software applicadon which conforms to the interface and protocol 
requirements defined by this specification, and which provides one or more specific and 
well defined services or operations. 

10 Per the general qualities derived from the Metia Framewodc, every agent can be said to 

exhibit the following two qualities: 

modular 

The implementation details of the agent are hidden behind the generic interfaces and 
IS protocols of the framework, such tfiat any other agent, user, client, or process can 

interact with the agent without any privileged knowledge of its internal workings. 

distributed 

Every agent is accessible over the network from any system which has access to the 
system on which the agent resides. 

In addition to the above, an agent may also exhibit one or more of the following qualities: 

intelligent 

An agent may be sensitive to the environment, system, or particular context in which it 
is operating, automatically adjusting its behavior accordingly. 

replicating 

An agent may create copies of itself to optimize processing of a given operation by 
dividing portions of the task to each copy, which (dq)endiiig on the undedying system) 
may be executed in parallel. 

persistent 

An agent may remain in memory and function beyond the duration of a single 
operation, maintaining information from previous operations which may optimize or 
otherwise fekcilitate subsequent operations. 

collaborative 

An agent may utilize the services of otiier agents to perform an operation, and 
management of available agents and their services may be handled by a specialized 
"broker" agent with which available agents register. A collaborative agent is typically 
40 also a persistent agent 

mobUe 

An agent may move from machine to machine (create a copy of itself on another 
machine and then terminate), if needed to accomplish a given operation (such as 
45 updating information in a variety of locations). A mobile agent is typically also a 

persistent, replicating agent. 
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4.2 Agency 

An agency is a set of specific and well defined services and/or operations typically 
5 implemented by a set of agents (or other software components, systems, or tools) which arc 

organized under and accessed via a single managing agent. 

Technically, every agent can be viewed as an agency. The difference is primarily one of 
perspective. An agency is the abstract functionality and behavior embodied in (or provided 
via) an agent. The agent itself may be nothing more than a proxy to some other system or 
'0 service (such as an RDBMS application) which actually implements those services. Thus, 

while the agent may essentially provide the full range of functionality defined for an 
agency, it may not implement the full functionality of the agency itself. 



IS 



S Framework Architecture 



The Metia Framework architecture is based on a standard web server running on a platform 
which provides the basic POSIX command line and standard input/output stream 
^ functionality (see diagram on next page). 

One of the goals of the framework is to be media neutral, such diat the particular encoding 
of any data is not relevant to storage by or interchange between agents. This does not mean 
that specific encodings or other me<^a constraints may not exist for any given environment 
implementing the framework, depending on the operating system(s), tools, and processes 
used, only that the framework itself aims not to In^ose any such constraints itself. 

Every agent conforming to die framework must provide two interfaces: (1) HTTP+CGI, and 
(2) POSCC command line + standard input/output(enor. In addition to tiiese, an agent may 
also provide interfaces based on (3) Java method invocation and/or (4) CORBA method 
3Q invocation. These interfiices are defined in greater detail below. Any given agent (or other 

user, client, or process) is free to choose among the available interfaces provided by an 
agent; whichever is most optimal for the particular context or application. 

Non-agent systems, processes, tools, or services which are utilized by an agent can still be 
accessed via proprietary means if necessary or isseful for any operaticms or processes 
35 outside of the scope of the frameworic Thus, framework based tools and services can co- 

exist freely witti other tools and services utilizing fte same resources. 
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L Framework Protocols and Interfaces 

5.1 .1 Media Attribution and Reference Semantics (MARS) 

MARS is the language by which agents communicate and is the "heart" of the Metia 
Framework. All other protocols and interfaces defined by the framework are merely a 
means to transfer data streams which are defined, directed, and controlled by MARS 
metadata. See section 6.1 and the separate MARS specification. 

5.1.2 POSEK 

The framework adopts the POSDC standard specifications for command line arguments, 
standard input stream, standard output stream, and standard error stream as the primary 
local (system internal) interface used for agent intercommunication and data interchange. 
Every framework agent must provide a POSDC interface. See section 5.2.1 below regarding 
MARS command line and standard input parameter encoding, 

5.1^ HTTP + CGI 

The framework adopts HTTP-fCGI as the primary distributed (network) interface used for 
agent intercommunication and data interdaange. 

Every framework agent must provide an HTTP+CGI interface using the HTTP GET 
method. See section 5.2.1 below regarding MARS CGI parameter encoding. 

5.1.4 Java 

Agents which are implemented using the Metia Framework SDK vwU provide for direct 
method invocation acconKng to the Agency Java interface, included in die SDK. 

5.1.5 CORBA 

Agents may provide for direct method invocation via a CORBA interface according to tiie 
Agency IDL interface, included in the Media Framework SDK. 
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5.2 Agent Intercommunication 

Agents communicate with one another, and with external clients and processes, using 
* MARS metadata semantics, encoded as a property set (a set of values associated with named 

properties. MARS property sets are the only allowed means of communication, regardless 
of the interface used. 

10 Property Set Specification 

MARS property sets can be passed to any agent in one of the following ways: 
1 . Command Line Arguments (multiple sets separated by the special argument •-•) 
Examples: 



IS 



-ident:lfler xyzX23 -language en -encoding xhtml 
-idenCifier abc — -identifier de£ identifier ghi 



2. HTTP/CGI (multiple sets separated by the special valueless field ) 

Examples: 

http;//- , .fcidentif ier«xyzl23&lajiguage=en&encoding*xhtml 
2^ http: // . , .ftidentif ler-abc&— tidentif ier-def &— fcidentif ier=ghi 

3. Standard Input, encoded as XML instance 

Examples: 

30 

<7xml ver6ionoi«1.0* ?> 
<MARS> 

<prpperty_Bet> 

<identifier><token>xyzl23</token></identifier> 

^ <language><l; en/ ></ language > 

<encodiiig><xhtml/>< /encodlng> 
</proper tyjset > 
</MA2l8> 

40 

<?xinl versions* 1.0'?> 
<MARS> 

<p r ope r t y_s e t > 

< ident if ier >< tokcn>abc< / token ></ ident i fier > 
45 </property_Bet> 

<prpperty_aet > 

<identifier><to)cen>de£</token></identifier> 
</property__set> 
<property^set> 

^ <identifier><token>ghi</token></identifier> 

</propercy_set> 
</HARS> 
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4. Software method invocation (passing instantiated MARS object) 
Examples: 

5 

myftgent .retrieve (ntyMARS) ; 

myAgent . gene rate ( sourceMARS , - targe tMARS ) ; 

Conunand Line/CGI arguments take precedence over standard input, and if specified, 
io standard input, if any, is treated only as an input data stream. Most interaction between 

agents will specify operations via either command line or CXjI arguments. 

Every agent, regardless of implementation, must provide support for the first three 
interfaces defined above (command line, CGI, and standard input). Agents implemented 
using die Metia SDK must provide support for the fourth interface defined above (method 
invocation). 



IS 
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5.2.2 Interpretation of Multiple Property Sets 

If multiple property sets are specified, either via arguments or standard input, then they are 
to be interpreted as follows: 

1. The first property set must contain an action property value. 

2. If only one property set is defined, then the single action is performed as specified by the 
property set 

3. If the action of the first property set is 'store*, then either both the component property 
must equal "meta* and the item prqperty must equal 'data* or die item property must equal 
"meta"; in which case the second property set is taken to be a metadata property set to be 
stored persistently. It is then an error for there to be more than two property sets in the 

30 input 

4. If the action of the first property set is 'generate*, then the first property set is taken as 
defining the target of the generation and the second property set is expected to define the 
source of &e generation which must be refadeved. Any subsequent property sets are taken 

^ to be part of a compound acdon to be applied in succession to the results of the 

generation. It is then an error for any subsequent property set not to have an action 
defined. 

5. If all property sets have an action defined, then the input is taken to be a compound 
action, and each action is to be applied to die results of the previous acdon in succession. 

^ If a preceding action returns a data stream, then the subsequent acdon is to take that 

stream as input; odierwise, it is to retrieve the first item explicitly specified by a 
preceding property set 

€u If the locate* action is included in a compound action sequence, then the chain of 
45 subsequent actions following the locate action are applied in succession to each of the 

items identified by the locate action. 

All other combinations of property sets are either invalid or left to die custom interpretation 

of the particular agent. 

*o It is not permitted for any Metia agent to apply an interpretation which confiicts with the 

interpretation specified above. 
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5.2.3 Diagnostics and Error Notification 

All errors, warnings, cautions^ and other notes output by an agpnt which arc not part of a 
5 result value must be output on the standard error port composed as an XML instance 

conforming to the Metia Framework Diagnostics DTD: 

http;//mfi^a.nokia.cotTi/schemas/diagnostics/LO/dtd/ 
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5.2.3,1 Diagnostic Notification Types 

The Metia Framework Diagnostics DTD provides for the following notijBcation types: 
Error 

An error signals an occurance which prevents an agent from continuing a particular 
process or task. The error condition may or may not be recoverable. Typically it is not. 



Warning 

A warning constitutes a condition or occurance which could cause loss or corruption of 
information, damage to equipment, or failure of a critical service. 

Caution 

A caution constitutes a condition or occurance which could affect the efficiency of 
equipment or of a service, or which may limit the effectiveness of a ^ven process. 

Note 

A note constitutes any general information about equipment, a service, a process, or 
data which is considered significant 

Debug 

30 A debug notification is any general information about the operation of the agent as 

regards its implementation and which might be meaningful to developers or maintainers 
of the agent software. 

The content of any given notification is free-form may consist of pre-formatted diagnostics 
^ from legacy tools or systems, well formed XML markup, or any other textual data. By 

default, any given agent receiving diagnostics from another agent is reqmred only to be able 
to recognize the particular notification type(s) and optionally display the literal 
notificarion(s) content (including any markup) to an end-user. Particujar agents, however, 
may contract to use specific markup for notification content to facilitate specialized 
4Q procesang and/or display of notifications. 



5.2.3.2 Diagnostics in a CGI Environment 

fa the case of an agent operating in a CQl environment, which does not provide for 
separate standard output and standard error streams, diagnostics may be returned either in 
place of the return value (in the case of a fatal error) or as part of a multipart MIME stream 
consisting first of the return value and secondly of the diagnostics instance. 
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6 Frame^fork Components 

The Metia Framework is comprised of a number of components, each defining a core area 
of functionally needed in the construction of a complete production and distribution 
environment. 

Each frwnework component is defined separately by its own specificadon. This section only 
summarizes the role of each component within the Meda Framework, Please consult die 
specification for each framework component for more detailed information. 

6,1 Media Attribution and Reference Semantics (MARS) 

Media Attribution and Reference Semantics (MARS) is a metadata specificadon framework 
and core standard vocabulary and semantics facilitating the portable management, 
referencing, distribution, storage and retrieval of electronic media. 

MARS is the common ''language'* by which the different Metia Framework agencies 
communicate. 

MARS is designed specifically for the definition of metadata for use by automated systems 
and for the consistent, platform independent conununication between software components 
storing, exchanging, modifying, accessing, searching, and/or displaying various types of 
electronic media such as documentation, images, video, etc. It is designed with 
considerations for automated processing and storage by coniputeT systenis in mind, not 
particularly for direct consumption by humans; though mechanisms are provided for 
assodating vdtix any given metadata property one or more presentation labels for use in user 
interfaces, r^rts, forms, etc. 

MARS aims to fulfill the following two goals: 

1. To define a fiamework within which metadata can be explidtly defined and efficiently 
and reliably processed by automated systems. 

2, To define a core metadata vocabulary of properties and values for automated systems 
used for storing, exchanging, operating on, and/or displaying electronic media. 

UtUiadng a common abstract metadata vocabulary and semantics for aD refwence and 
communication functions by all agents within the framework affords a considerable amount 
of modularity, salability , and flexibility for any given set of agents, as each agent constitutes 
a 'l3lack-box" and specific implementation details are irrelevant insofar as their interaction 
with users and other agents is concerned, and new agents added to an environment are 
immediately and transparently usable by existing processes. The core MARS vocabulary 
also provides for an information rich environment enabling processes and operations not 
possible using only simple identifiers such as filenames, URL's, DOI's, and similar. 

6.1.1 XML 

XML is used for the serialization, interchange, and (typically) persistent storage of MARS 
metadata property sets. The Metia Java SDK provides for the importation and exportation 
of MARS XML encoded instances to and from MARS class instances. 

6.1.2 XML DTD 

50 An XML DTD for the general framework and for the core properties defined by MARS is 

defined as a component of the Metia Framework. The common tools and processes 
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opemdng on or directed by MARS metadata must support metadata property value sets 
encoded as XML instances conforming to diis DTD. 

5 The defined DTD provides mechanisms by which additional properties and property values 

are defined as needed by particular business units, product lines, processes* etc. 

htto://meria.nolda.oQm/schemas/mars/2.0/dtd/ 

10 6.1.3 XML Schema 

An XML Schema for the general framework and for the core properties defined by MARS 
is defined as a component of the Metia FrameworK* and the common tools and processes 
operating on or directed by MARS metadata must support metadata property value sets 
encoded as XML instances conforming to this Schema. 

The XML Schema provides for more rigorous validation of MARS XML instances, and is 
recommended over validadon by DTD wherever possible. 

The defined XML Schema provides mechanisms by which additional properties and 
property values are defined as needed by particular business units, product lines, processes, 
20 etc. 

http;//meda.nQta'a.cQm/schemas/mars/2.0/xsd/ 

RDF Schema 

25 An RDF Schema for the core properties defined by MARS is defined as a component of the 

Metia Framework, and which grounds their semantic interpretation of MARS in the Dublin 
Core atiid Noida Metadata for Documents, as well as provides a foundation for defining 
additional semantic qualities of tiie core vocabulary and its relationships to other 
vocabularies. 

^ http://metia.nokia.cQm/5chemas/maTs/2.(yrdf/ 

6.2 Generalized Media ArdiiTe (6M^) 

^ The Geaenlized Media Archive (GMA) is an abstract archival model for the storage and 

management of data based solely on Media Attribution and Reference Semantics (MAR2^^. 
metadata; providing a uniform, consistmt, and implementation indqiendent model for 
information storage and retrieval, versioning, and access control. 

The GMA is a central component of the Metia Framework and serves as the cotrunon 
40 archival modd for all managed media controlled and/or accessed by Metia Framework 

agencies. It constitutes an Agency, which may be implemented as one or more Agents. 

The GMA provides a uniform, generic, and abstract organizational model and functional 
interface to a potentially wide range of actual archive implementations; independent of 
operating system, file system, repoatory organization, or other implementation details. This 
abstraction facilitates the creation of tools, processes, and methodologies based on this 
generic model and interface which are insulated from the internals of the GMA compliant 
repositories with which they interact. 

The GMA defines specific behavior for basic storage and retrieval, access control base-JI cr 
^0 user identity, versioning, and automated generation of variant encodings. The identity of 

individual storage items is based on MARS and all interacdon between a client and a GMA 
implementation must be expiessed as MARS metadata property sets. 
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6.3 Portable Media Archive (PMA) 

The Portable Media Archive (PMA) is a physical organization model of a file system based . 
^ data rq)ository confomiing to and suitable for implementations of the Generalized Meda 

Archive (GMA) abstract archival model. 

The PMA defines an explicit yet highly portable file system organization for the storage and 
retrieval of information based on Media Attribution and Reference Semantics (MARS) 
10 metadata. The PMA uses the MARS Identity and Item Qualifier metadata pr<^)erty values 

themselves as directory and/or file names, avoiding the need for a secondary referencing 
mechanism and thereby simplifying the implementation, maximizing effidency, and 
producing a mnemonic organizational structure. 

Any GMA may use a physical organization model other than the PMA. The PMA physical 
archival model is not a requirement of the GMA abstract archival model. However, flic 
PMA may nevertfadess be employed by such implementations both as a data interchange 
format between disparate GMA implementations as wdl as a fonnat for storing portable 
backups of a given archive. 



20 



6 A Registry Service ArdUtecture (REGS) 



The Registry Service Architecture (REGS) is a generic architecture for dynamic query 
resolution agencies based on the Metia Framework and Media Attribution and Reference 
Semantics (MARS), providing a unified interface modd for a broad range of search and 
retrieval tools. A particular registry service constitutes an Agency, which may be 
implemented as one or more Agents. 

REGS provides a generic means to interact with any number of specialized search and 
retrieval tools using a comirion set of protocols and interfaces based on the Metia 
Framework; namdy MARS metadata semantics and either a POSDC or CGI compliant 
intetface. As with other Metia Framework components, this allows for much greater 
flexibility in the implementation and evolution of particular solutions while minimizing the 
interdependeades between the tools and thdr users (human or otfierwise). 

Being based on MARS metadata allows for a high degree of automation and tight 
35 synchronization with the archival and management systems used in the same environment, 

wilii each registry service deriving its own registry database directly from the metadatK 
stored in and maintained by the various archives themselves; while at the same time, each 
registry service is insulated &om the implementation details of and changes in the archives 
firom which it receives its information. 

Every registry service shares a common architecture and fundamental behavior, differing 
primarily only in the actual metadata properties required for their particular application. 



40 



6,5 Java SDK 



45 



The Metia Java SDK (Software Development Kit) provides software components 
implementing the core models and behavior defined by the Metia Framework and its 
components. 

The SDK is implemented in Java conforming to the Java 2 platform specification and 
so resides in the Java package com.nokia.ncde. 
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This section provides a general overview of the principle classes and interfaces defined in 
the SDK. Consult the JavaDoc documentation for more information about these and other 
classes and components. 

5 

6.5.1 MARS 

MARS (com.nokia.ncde.MARS) is a Java class which provides a unifonti container for 
storing, accessing, defining, and passing MARS metadata property sets, including methods 
10 for importing from and exporting to XML encoded instances conforming to (he MARS 

DTD. 



Agency 

• 15 Agency (com .nokia.ncde. Agency} is a Java interface which defines the common hehavior 

(methods) which are implemented and shared by all Framework agents. 

6.S3 Agent 

20 Agent (comjjokia,ncde.Agent) is a Java abstract dass which implements the Agency 

interface and provides default methods for basic agent behavior and which is ^ically the 
parent or ancestor class of specific agent implementations built using tibe Media SDK. 

6SA AgeutProxy 

AgentProxy (com.nokia.ncde.AgentProxy) is a Java wrapper class which provides a 
convenient mechanism for interacting with the network CGI interface of any Agency, as if 
it were a local object within a Java application (typically an agent). 

so 6.5,5 AgentServlet 

AgentServlet (com.nokia.ncde.AgentServlet) is a Java wrapper class which provides Java 
Servlet fdnctionality to any class implementing the Agency interface. 

35 6.5.6 Agents erver 

AgentServer (com.nokia.ncde.AgentServer) is a Java wrapper class which provides CORB A 
server functionality to any class implementing the Agency inter&ce. 

40 6.5.7 AgentClient 

AgentClient (com.nokia,ncde,AgentClient) is a Java wrapper class which provides COKBA 
client functionality to any class implementing the Agency interface. 

45 -i 
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1 Scope 

15 

This document defines the Media Attribution and Reference Semantics (MARS), a metadata 
specification framework and core standard vocabulary and semandcs facilitating the 
portable management, referencing, distribution, storage and retrieval of electronic media. 

so 2 Overview 

MARS is designed specifically for the definition of metadata for use by automated systenas 
and for the consistent, platform independent communication between software components 
storing, exchanging, modifying, accessing, searching, and/or displaying various types of 
information such as documentadon, images, video, etc. It is designed with considerations 
for automated processing and storage by computer systems in mind, not particularly for 
direct consuoiption by humans; though mechanisms are provided for associating with any 
gjven metadata property one or more presentation labels for use in user intei&ces, reports, 
forms, etc. 

30 MARS aims to fulfill the following two goals: 

1. To define a &amew<^ within which metadata can be e^^lidHy defined and efficiently 
and reliably processed by automated systems. 

2. To define a core metadata vocabulary of properties and values for automated systeim 
^ used for storing, exchanging, operating on, and/or displaying electronic media. 

Extensibility of the core vocabulary is of course of paramount importance, as MARS cannot 
address all of the needs of all groups, systems, processes, products folly and still serve as a 
managpable standard; nor can it foresee all possible needs and applications in the future; 
however, it remains posdble and benefidal bo& to define as rigorously as possible a 
^ framework for metadata and a core vocabulary and tiien ^able extensions and 

enhancements to that core as needed, within the constraints of that firamework. 

It is in^rtant to note tiiat the core vocabulary defined by MARS is data-centric and not 
use-centric, in that the metadata properties defined tfaerdn apply primarily to characteristics 
or attributes of the data itself, and not how, where, or by whom the data is used or 
referenced. Processes such as for Product Data Management (PDM), Configuration 
Management (CM), and Work Flow Management (WFM) are not directiy addressed in the 
core MARS vocabulary as tiiese define uses of the data and not characteristics of the data 
itself. 

The core vocabulary is specifically designed to meet the needs of organization and 
management processes applied to large volumes of technical and user documentation, 
though the framework and most if not aU of the core vocabulary is applicable to many other 
^plications as wdl. 
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3 Related Documents, Standards, and Specifications 

* 3.1 Metia Framework for Electronic Media 

The Meda Fraoiework is a generalized metadata driven framework for the management and 
distribution of electronic media which defines a set of standard, open and portable models, 
interfaces, and protocols facilitating the construction of tools and environments optimizc^i 
10 for the management, referendng, distribution, storage, and retrieval of electronic media; as 

wdl as a set of core software components (agents) providing functions and services relating 
to archival, ver^oning, access control, search, retrieval, conversion, navigation, and 
metadata management 

MARS is a component of the Metia Framework and serves as the common "language" by 
which the diffetent Meda Framework agents communicate. 

htto://metiajiokia.com/5pedficationy#Metia 

3,2 Generalized Media Archive (GMA) 

The Generalized Media Archive (OMA), a component of the Metia Framework, is an 
abstract archival modd for tiie storage and management of data based solely on Media 
Attribution and Reference Semantics (MARS) metadata: providing a uniform, consistent, 
and implementation independent model for information storage and retrieval, versioning, 
and access control. 

httD://metia,nokia>cQm/spedfications/#GMA 

3.3 Portable Media Archive (PMA) 

30 The Portable Media Archive (PMA), a component of the Metia Framework, is a physical 

organization model of a file system based data repository confDra[ung to and suitable for 
implementations of the Generalized Media Ardiive (GMA) abstract archival model 

http://meria.nokla.com/specifications^#PMA 

35 

3.4 Registry Service Architecture (REGS) 

The Registry Service Architecture (REGS), a component of the Metia Framework, is a 
generic architecture for dynamic query resolution agencies based on* the Metia Framework 
and Media Attribution and Reference Semantics (MARS), providing a unified interface 
^ model for a broad range of search and retrieval tools. 

http7/metia,nokia.cotn/SDeclfications/#REGS 
3.5 "Nokia Metadata for Documents 

45 

MARS is a derivative of Nokia Metadata for Documents. MARS deviates firom that wori tc- 
some degree in order to meet the specific requirements of the Metia Framework; primarily 
where identity and management properties and more rigorous data typing is required. 

Within all systems and environments based on Metia Framework. MARS supersedes the 
^ Nokia Metadata for Documents spedfication for all metadata related applications. 
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5 

3.6 The Dublin Core 

The Dublin Core is a metadata element set intended to facilitate discovery of electronic 
resources. Originally conceived for author-generated description of Web resources, it has 
attracted the attention of formal resource description communities such as museums, 
libraries, government agencies, and commercial organizations. 

MARS can be viewed as a functional superset of the Dublin Core, and an RDF Schema for 
MARS could be created which inherits directly from &e Dublin Gore RDF Schema, such 
that any tools which are designed to operate on Dublin Core compliant metadata will also 
IS be able to operate correctly on MARS compliant metadata. 

http://purl.oclc.ory/metadata/dub1in core/ 

3.7 ISO 639: Language Codes 

ISO 639 specifies a set of two-letter codes represented by case-insensitive ASCII characters 
which uniquely identify world langiiages. 

MARS adopts ISO 639 language codes for the allowed values of certain property types. 

http;//.yywiii§?rct)/ 

2S 

3JS ISO 3166-1: Country Codes 

ISO 3166-1 specifies a set of two-letter codes represented by case*inseasitive ASCn 
characters which uniquely identify countries. 

30 

MARS adopts ISO 3166-1 country codes for the allowed values of certain property types. 
httD://www.iso.ch/ 

^ 3.9 ISO 8601: General Date and Tiine Formats 

ISO 8601 specifies a number of standard methods for encoding date and time infonnation, 
for portability between different computer systems and applicadons. 

MARS adopts a subset of ISO 8601 encodings for the allowed values of certain property 

40 types- 

http'//www.iso.ch/ 

3.10 W3C TR NOTE datetime: Specific Date and Time Formats 

The datetime W3C TR note defines a profile of ISO 8601 , the International Standard for the 
representation of dates and times, restricting the supported formats to a smaller number 
likely to satisfy most requirements. 

MARS adopts a subset of the W3C datetime NOTE encodings for the allowed values of 
so certain property types. 

httD://www.w3.org/TR/NOTE-datetime 
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3.11 RFC 2046: MIME OMLultipurpose Internet Mafl Extensions) 

The IETF MIME standard defines a platform independent and portable media typing system 
^ and defines an initial set of media types and general media encoding properties. The MIME 

system is used by a broad range of internet and other systems, standards, and protocols. 

MARS adopts RFC 2046 content type and charact^ set identifiers for the allowed values of 
certain property types. 

http://www.ietf.Qrg/rfcAfc20 46.(xt7number=2046 

3.12 WSCTRxptr: XML Pointer Language 

XPointer, which is based on the XML Path ILanguage (XPath), supports addressing into the 
internal structures of XML documents. It allows for traversals of a document tree and 
choice of its internal parts based on various properties, such as element types, attribute 
values, character content, and relative position. 

MARS adopts W3C XPointer syntax for the allowed values of certain property types. 

htB>;//www.w3i9rs/T^3qptr 



20 



3 A3 Commou Gateway Interface (CGI) 

25 



The Common Gateway Interface (CGI) is a standard for interfacing external applications 
with information servers, such as Web servers. Within tfie new Metia Framework, CGI will 
serve as the primary communication mechanism between networked clients and software 



30 



The MARS Agency data type is comprised of a CGI URL prefix. 
httD://hoohQo.ncsa.uiuc.edu/cgi/overview.html 



3.14 RFC 2396: Uniform Resource Identifier (URI) 

35 A Uniform Resource Identifier (URI) is a compact string of characters for identifying an 

abstract or physical resource. It serves as the general syntax by which URNs, UrLs, and 
other identifiers are defined. 

MARS adopts RFC 2396 URIs for the allowed values of certain property types. 
^ http://www.ietf.org/rfc/rfc2396.txt7numbeTs2396 

3.15 RFC 2141: Uniform Resource Name (URN) 

- Uniform Resource Names (URNs) are intended to serve as persistent, location-independent, 
45 resource identifiers and are designed to make it easy to map other namespaces (which share 

the properties of URNs) into URN-space. The URN syntax provides a means to encode 
character data in a form diat can be sent in existing protocols, transcribed on most 
keyboards, etc. 

MARS adopts RFC 2141 URNs for the allowed values of certain property types. 
htto://www.ietf.org/rfc/rfc21 41.txt?number=:2141 
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3.16 RFC 1738: Uniform Resource Locator (URL) 

A Uniform Resource Locator (URL) is a compact string of characters for identifying a 
^ physical tesource available via the Internet It is the most common form of URI presently in 

useon the web. 

MARS adopts RFC 1738 URLs for the allowed values of certain property t;ypes. 
^ttp://www4eCon^c/rfc^738-txt?f)um1?e|^n38 

10 

3.17 Unicode 

The Unicode Standard is a fixed-width, uniform encoding scheme for written characters and 
text Tlie rqpertoire of this intematxona] character code for informadon processing includes 
characters for the major scripts of the world, as well as technical symbols in comunon use. 

MARS adopts Unicode for the allowed values of string property types. 



20 



3.18 POSIX Regular Expression Syntax 



POSDC (Portable Operating System Interface) is a set of standard- operating system 
interfaces based on &e UNIX operating system. The POSIX interfaces were developed 
under &e auspices of the IEEE (Institute of Electrical and Electronics Engineers) . Regular 
25 expressions are used to recognize spedfic patterns wittdn textual data. POSDC defines a 

standard encoding for regular expressions. 

MARS expresses proper^ value types using POSDC regular expres^on syntax. 
^ttp://Bati\4ante4ffee,Qra^c^triofi^9Hfi/mdeXtto^ 

30 

3.19 Metadata for Graphics in Cnstomer Docnmenfation 

Cjuidelines for the application of MARS metadata for the management of and access to 
graphics media in the NET Customer Documentation Environment (NCDE). 

^ http://helnsl2/NCDE2001/AdvancedGraDhics/specificadon 
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4 Key Terms and Concepts 
4.1 Property 

A property, for the puiposc of this specification, is a quality or attribute which can be 
assigned or related to an identifiable body of information, and is defined as an ordered 
collccdon of one or more values sharing a common name. The name of die collection 
represents the name of the property and the value(s) represent die realization of that 
property. Typically, constraints are placed on the values which may serve as fee realization 
of a given property. 

4.2 Property Set 

A property set is any set of valid MARS metadata properties. 



4.3 Media Object 

20 



25 



30 



Media objects represent abstract bodies of information about which we can conimum<»te 
and which correspond to common orgamzadonal concepts such as "dooiraent", "book", 
"manual", "chapter", "secdon'*, .«sidebai^', "table", "image", "chart", "diagram", "graph", 
"photo", "video se^ent", "audio stream", etc. 

They are, however, abstract and have no ^ecification for any gj.ven language, coverage, or 
encoding. The same media object can be realized in many languages, with many 
geogra5)hical, regional, distributional, or odier variations, and be encoded in a multitude of 
formats, without affecting in the least fee scope and qualities of fee information feat they 
embody. 

An abstract media object is given an identifier which is intended to be unique for the entire 
known universe. So long as all media objects within a given environment follow fee same 
identification scheme, or any number of mutually exclusive schemes, then all will be well. 
It is up to fee tools and processes in use to ensure feat media object identifiers remain 
unique wifein any given environment. 

4.4 Media Instance 

A media instance represents a particular realization of an abstract media object for a 
particular language, coverage, encoding, and release. Every distinct combination of feese 
four properties constimtes a different instance of fee media object. Some (in fact most) 
40 instances of a ©ven media object will be automatically generated, derived from some ofeer 

instance, particularly feose differing in encoding. Similarly, instances in various languages 
will typically all be derived from a single instance, representing the source language from 
which all translations to other languages are made. 

4.5 Media Component 

^ Each media instance is comprised of a set of components, which are all intimately related to 

tiiat particular realization and inseparable from it Most of feese components arc 
automatically generated, or are accessed and modified only indirectly via one or more 
50 storage and/or management systems. The only mandatory component for a media instance is 
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the data component. The existence and use of other components depends on the specific 
needs, functions, requirements, or processes comprising the enviroranent witiiin which that 
data resides. MARS defines a bounded set of component types; though this may be 
extended as needed as new requirements, processes, or methodologies arise. 

Media objects may also contain components, in which case the components are taken to 
represent properties or other characteristics inherited by or attributable to each instance of 
that media object 

4.€ Storage Item 

Storage items constitute the only actual physical entities within a MARS based 
environment Just as a media instance is comprised of one or more components, so a 
component is comprised of one or more storage items. 

Items coae^ond to what would typically be stored in a single file or database record, and 
are the things which are actually created, encoded, modified, transfeiied, etc. Items may 
embody content, content fragments, metadata, revision deltas, or other infonnation needed 
for the reliable storage, management, and processing of a given media component. Items are 
the discrete computationa] objects which are passed from process to process, and which 
form the building blocks from which the information space and the environment used to 
manage, navigate, and manipulate it are formed. 

4.7 Qualified Data Item 

Any given *data' storage item for any component may be qualified in one or more of the 
following ways: 

4.7.1 Content Pointer 

MARS provides for referencing (and hence defining an explicit identity for) specific 
content within a given item, component, instance, or object; depending on the nature of the 
referrace. E.g., a particular element vntiiin an SGML, HTML, or XML entity can be 
referenced by a unique element identifier, which would be valid for all of the above 
mentioned scopes. Alternatively, the reference could be based on a particular path through 
the stnictore of the entity, possibly spedfying a given range of data content characters, in 
which case it might be valid only for a particular component or item. . 

MARS adopts the W3C XPointer standard for encoding such content specific references in 
SGML. HTML, or XML content, and it is up to a given application, process, or 
methodology to ensure the validity of references applied at a given scope. It is 
recommended that wherever possible that explicit element ID values be used for all pointer 
.references and that structural paths and data content specific references be avoided if at all 
possible; for the sake of maximal validity of pointer values to all realizations of a given 
media object, irrespccdve of language, coverage, encoding, or partitioning. 

Though XPoincer is not yet a final Recommendation by the W3C. and some changes may 
occur within the standard, it is presendy a Candidate Recommendation and is expected to 
reach full Recommendation status in the very near future- 
Future versions of MARS may adopt additional internal pointer mechanisms for other 
encodings as needed and as available. 

Content pointers are only defined for 'data' storage items. 
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4.7.2 Revision 

A revision is an identifiable editorial milestone for a 'data' storage item within the scope of a 
5 particular managed release. It is a snapshot in time, either static or reproducible, to which 

one can return. 

Revisions are only defined and maintained for 'data' storage items. 

10 4.7.3 Fragment 

A fragment is an identifiable linear sub-sequence of the data content of a component, either 
static or reproducible, which can be provided in cases where the full content is either too 
large in volume for a particular application or not specifically relevant. 

15 Fragments are only defined and mainCcdned for 'data' storage items. 

4.8 Inheritance of Metadata 

Metadata defined at higher scopes is inherited by lower scopes* There are two simple rules 
20 governing the inheritance of metadata from hi^er scopes to lower scopes: 

1. All mietadata properties defmed in higher scopes are fully visible, applicable, and 
meaningful in all lower scopes, vdthout exception. 

2. Any property defined in a lower scope completely ovexiides, hides, shadows, replaces 
any definition of the same property that might exist in a higher scope. 

Thus, all metadata properties defined for a media object are inherited by all instances of that 
object; and all metadata properties defined for a media instance (or media object) are 
inherited by all of its components. 

MARS does not define the mechanisms, algorithms or other procedures for affecting the 
inheritance of metadata properties defined in higher scopes to operations performed in 
lower scopes. It is the responsibility of the tools and processes to ensure that metadata is 
inherited prop^ly and reliably. 

35 4«9 Yersiomng Modd 

MARS defines a sin5)le, portable, and practical versioning model using only two levels of 
distinction, corresponding to the concepts of 'release* and 'revision*. 

A release is a published version of a media instance which is maintained and/or distributed 
4^ in parallel to other releases. One could view a release as a branch in common tree based 

versioning models. A revision is a milestone in the editorial lifecycle of a given release; or a 
node on a branch. 

In addition to release and revision, a particular coverage can be defined and applied to a 
45 media instance to differentiate variant content intended for a particular application and/or 

audience. 
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S Metadata Classification and Naming Conventions 



5 5S Property Name 

All property names must be valid tokens (see fbnnal specification in section 5.2.1). 
Bsrthennore, all property name tokens for a given environment share the same lexical 
scope. 

10 The format for tokens was motivated by &e desire to have a naming scheme which could be 

used consistently across a very broad scope of encodings. This not only makes adoption and 
application of such a standard easier in a heterogeneous environment but also simpli^es ^e 
construction of and intoaction between common tools and processes. 

Compatibility witii a very broad set of encoding schemes allows for MARS metadata 
property names and token values to be used as variables, symbols, names, tokens, 
identifiers, directories, filenames, etc. in the various encoding schemes, allowing for 
consistent semantics both for the metadata itself as well as for the systems, applications and 
models storing, operating on, describing, and/or referencing that metadata. 

Encodings for which the token fomiat is known to be compatible include: 

Pmgramming/Scripting/Command Languages: 

C,'C++, Objective "C. Java, Visiwl BASIC, Ada, Smalltalk, LISP, Emacs Lisp, 
Scheme, Prolog, JavaScript/ECMAScript, Perl, Python, TCL, Bourne Shell, C 
SheU. Z SheU. Bash, Kom SheU. POSDC, Win32, REXX. SQL- 

Markup/Typesetting Languages: 

SGML. XML, HTML, XHTML, DSSSL, CSS, PostScript, PDF. 

FUe Systems: 

FAT (MS-DOS), VFAT (Windows 95/98). NTFS (Windows NT/2000), HFS 
(Macintosh). HPFS (OS/2). HP/UX, UFS (Solaris), ext2 (Linux). ODS-2 (VMS), 
NFS, ISO 9660 (CDROM), UDF (CDR/W. DVD). 

It is likely that there exist many other encodings, in addition to those listed above, with 
which the MARS token foniiat is compatible. 
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52, Property Value Type 



MARS defines a number of property value types which serve to constrain the format and 
content of specific values. These data typing constraints simplify tiie construction of 
^ software systems which operate on MARS metadata, and provide for more consistent and 

unifomi usage. 

The total lengtii or magnitude of property values, or sets of values, is only dependent on the 
- storage limitations of the systems and tools operating on the metadata. MARS itself imposes 
4s no arbitrary restrictions. 

Specific environments, processes, systems, or applications might restrict the magnitude of 
one or more value types to satisfy storage, bandwidth, or other constraints. MARS property 
value types may be constrained further (e.g. limiting Identity property token values to 30 
characters, or limiting integers to tiie range 0..9999) but may not be relaxed in any fashion 
so (Q,g^ allowing tokens to have case distinction or include white space or colon characters. 
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etc.). It is up to each system and/or application to address the risk of data loss or corruption 
when unable to support the magnitude of existing metadata property values, 
5 Many property values are ''Environment Dependent". This means that they may be specific 

to a given system or LAN, or may be defined by an organization, business unit, product 
line, etc. and thus not have global significance - nor guaranteed to be globally unique if two 
previously disjunct environments are merged, where e.g. a token is used as the value for a 
given property in both environments, but with different semantics. 

In the property specifications bdow. properties which may have values which are 
environment dependent are marked with an asterisk. 

Although MARS defines only a core set of metadata properties, and one can extend MARS 
with additional properties and allowed values for core MARS properties, it remains an 
IS important goal to maintain as much uruformity and consistency between all applications of 

MARS, and every possible effort should be made to publish and synchronize all MARS 
ext^ded property sets; with the addition of new properties and values to the core standard 
where clearly justified by common usage. 

5.2-1 Token 

Any sequence of characters beginning ynfh a lowercase alphabetic charact^ followed by 
zero ormore lowercase alphanmneric characters witii" optional single intervening underscore 
characters. More specifically, any string matdiing the following POSK regular expression: 

25 

/[a-zl (_?ta-z0-9n*/ 
Examples: 

abed 

^0 ab cd 

al23 

x2 3_4_5 

hMBjs_a^vciyJong_tokea^value 

^ Most MARS metadata properties are of type token, particularly those which are controlled 

sets. In fact, a token value type can usually be considered synonymous with an expbat. 
bound, and typically ordinal set of values. The primary reasons for this are (1) informaUon 
management processes based on controlled sets of cxpUdtiy defined values are more robust 
than those based on arbitrary values, and (2) that current and emerging tools and 

^ technologies for modeling, encoding, and processing structured information such as 

metadata provide special functionality for defining, validating, and processing bounded sets 
of token like symbols, which are not available for arbitrary strings. 

furthermore, because MARS is intended for the management of very large documentation 
45 se^s (millions or even billions of managed objects), practical considerations must be taken 

into account, and token values impose far less demands on storage than arbitrary strings in 
most circumstances. Since presentation issues can be addressed separately from internal 
representations, more concise and efficient token values can be utilized. Longer, more user- 
friendly, and mnemonic labels may be associated with each property name and token value, 
includin* different labels for various languages or other needs, which can be defined once 
in a schema or similar specification and used wherever needed when presenting metadata 
information to a human being; without unnecessarily burdening the systems storing, 
operating on. or being directed/controlled by that metadata. 
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AJl defined token values must have an explicitly specified and fixed value for both 'name' 
(corresponding to the token itsdO and a label' (used for presentation puiposes). 



5^.2 Integer 



Any sequence of one or more decimal digit characters representing a signed integer value. 
More specifically, any string matching the following POSEX regular expression: 

/t\-\+]?lo-9)+/ 



Examples: 

12345 
0 



-9590728691 

32 

♦32 



20 



S.2.3 Count 

Any sequence of one or more decimal digit characters representing an unsigned (non- 
25 negative) integer value. More specifically, any string matching the following POSK regular 
expression: 

/[0'9]+/ 

30 Examples: 

12345 
0 

9590728691 
35 32 



40 



45 



50 



5.2.4 Dedznal 

' Any floating point numerical value in simple decimal notation. More spedfically, any string 
matching the following POSK regular expression: 

/C\-\+]?t0-9] A. to-9]+/ 



Examples: 



12345.0 
+ 0.02 

S. 9590728691 
-32.23 
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5,2.5 Percentage 



Any percentage value belonging to the integer value range from 0 to 100. More specifically, 
any string matching the following POSIX regular expression: 

/(lOO) I ( [1-9] [0-91) I ([0-91)/ 

Examples: 

15 
3 

73 
10 0 

Percentage values should nst be prefixed or suffixed by a percent sign. 



5,2.6 String 

^ Any sequence of one or more Unicode cbaracter/glyph code pomts. The particular Unicode 

conformant encoding (e.g. UTF-8, UrF-16, etc.) is system and application dependent and 
not specified expliciUy by MARS , 



5,2.7 Date 

A string conforming to ISO 8601 & W3C TR NOTE datetime- 19980827, defining a 
complete date: 

YYTiT-MM-DD 

30 where: 

YYYY = four-digit year 

MM = two-digit mouth (01s=January, etc) 

DD = two-digit day of month (01 through 31) 

- = literal separator (hyphen) 



Examples: 

^ 1966-03-31 
2000-05-01 
2193-12-31 

45 

5.2.8 Time 

A string conforming to ISO 8601 & W3C TR NOTE datetime- 19980827, defining a 
complete date plus hours, minutes, and seconds in Universal Coordinated Time: 

50 YYYY-MM-DDThh:mm:ssZ 

where: 
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YYYY = four-digit year 

MM = two-digit month (01 January, etc.) 
DD s two-digit day of month (01 through 31) 
T « literal separator indicating start of time component 
hh = two digits of hour (00 tiirougji 23) (am/pm NOT allowed) 
nun = two digits of minute (00 through 59) 
ss = two digits of second (00 through 59) 
Z = time zone dedgnator for Umversal Coordinated Time (UTC) 
- » literal separator (hyphen) 
: = literal separator (colon) 



Examples: 

15 

l966-03-31T05:ll:23Z 
2000-0S-01T22:S4 :OBZ 
21d3-12-31T23 : 59 : 59Z 



20 

5,2.9 Ranking 

A ranking value is a sequence of decimal separated integers. More specifically, any string 
matching the following FOSDC regular expression: 



25 



/C\-\+]?C0-9] + (\. l\-\+]7 [0-9]+)*/ 



Examples: 

30 7 

3,ll-4.7 
-2.1.2.9 
2.-1,1 



A ranking value defines a path in an ordered tree of nodes where the values for each dot 
delimited field specifies the sort order of the node in the tiee at that level of the path. The 
root node of tiie tree is not defined explicitly. The first integer value thus defines the sort 
order relating to the iraxnediate children (level 1) of the implicit root, the next integer 
defines the sort order relating to the children of tb^ level 1 node, etc. This defines a tree 
where the linear ordering of nodes is derivable by a depth first ordered traversal of the tree. 
E.g. the tokenrranking pairs foo:l, bar2, bas:3, and hoo'A represent the following tree: 



45 

(root)/ 

50 



1 (foo) 
2 (bar) 
3 (baa) 
4 (boo) 
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too < bar < bas < boo 
We can insert a token 'xxx' between *foo' and 'bar* witii the ranking 1-1* 



(root)/ 

Kfoo)/ 

Kxxx) 

2 (bar) 

3 (bas) 

4 (boo) 



defining the ordered set: 

foo < XXX < bar < bas < boo 

20 and then insert another token 'yyy' between Too* and 'xxx* with the ranking '1 .0": 

(root)/ 



Kfoo)/ 

0{yyy) 
Kxxx) 

2 (bar) 
3 (bas) 
4 (boo) 

defining die ordered set: 

foo < yyy < xxx < bar < baa < boo 

Ranking values are used to define the order of ranked token values. It is not allowed for any 
two values defined for the same property in a given environment to have an identical 
ranking (i.e. to define the same path in the ordered tree of nodes). 

It is expected that ranked token sets are seldom extended, and that extensions would be 
defined at the highest specification level possible, with all rank values normalized to simple 
poativc integer values. Nevertheless, the ranking value model defined here allows for 
unliniited arbitrary insertion of new ranked token values into any existing sequence as 
needed. 

S.2.10 ID 

A token which serves as a unique identifier for a particular property within a given 
environment. ID token values need not be unique across all properties. 
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5.2.11 Actor 

A string which serves as a unique identifier for an actor within a given envixonment. An 
5 actor is either a person or a software application which operates on, or has special 

responsibility or interest in the data in question. The actor identifier method employed must 
be supported by the user authentication processes in use within each particular environment. 

5.2.12 Agency 

A string comprising the URL prefix of the CGI interface to an ^f!etia Framework agency, up 
to and including the question mark; typically used to define the media object Archive or 
other Metia Framework compliant archive where particular data resides. E,g. 

"http : //docserv.nokia. com/GMA?'* 

15 

5.2.13 Content Type 

A string contdning a valid MIME Content Type. E.g.: "text/html", "text/xml", "imagp/glf 
"application/octst-stream'*. etc. 

20 

5.2.14 Character Set 

A string containing a valid MIME Character Set identifier. E.g. "us-ascii", •'iso-8859-1", 
S5 'hxtf-8". "utf.l6", "gb2312 •'iso-2022-jp''. "shiftjis". "euc-kr". etc. 

5^.15 Encoding 

An encoding is a complex data type representing a set of properties identified by a unique 
30 token name. They represent configurations of syntactic and semantic characteristics which 

are significant to the production or management of information in a given envirormicnt 
Only values for properties defined as part of the Encoding Module (see section 6.6) may be 
defined for an encoding data type. Encodings are the required data type for the 'encoding- 
property in the Identity Module in section 6.1.5. 

As with tokens, each encoding must have defined for it a 'name* and a label'. In addition, 
every encoding must have defined for it a valid MIME 'content^type* value. 



35 



5.2.15.1 Simple Encoding 

A simple encoding is one which has defined values only for the Encoding properties 
•content_type* and (optionally) 'character^set' and 'suffix'. Simple encodings are rou^ly 
equivalent in resolution to MSMB encodings. 

45 S.2.15.2 Complex Encoding 

A complex encoding is one which has defined values for at least one other Encoding 
property other than those allowed in a simple encoding, such as 'schema', line.delimitation', 
etc. 
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S.2.16 Universal Resource Identifier (URI) 

Any valid Universal Resource Identifier (URI). 
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This may be a URL (Uniform Resource Locator), a URN (Uniform Resowce Name), or 
some other form of URI. 

S.2,17 Uniform Resource Locator (URL) 

Any valid Uniform Resource Locator (URL), 

A typical case is a URL referencing MARS classified data, consisting of a string containing 
the set of MARS metadata property name/value pairs formatted as a URL encoded string 
prefixed by the value of the "archive" property. E.g. 



«http : //xml .nokia .com/Q4A?action-retrieve&identif ier-dn99278&. 

5.2.18 Uniform Resource Name (URN) 

Any valid Uniform Resource Name (URN). 



5,2.19 Media Resource Name (MRN) 

Secdon 8 defines an explicit and compact URN syntax based on MARS Identity metadata 
properties for enco^ng the identity of any given storage item as a single string value. 

5.3 Property Value Count 
5.3.1 Single 

A single value count means that there can be at most one value for a given property. 



5,3-2 Multiple 

A multiple value count means that there can be one or more values for a given property. 

The order of multiple values may or may not be significant, but nevertheless must be 
^ preserved by any system or application storing, updating, accessing, or operating on the set 

of values. 

When encoded within a single string or field, multiple non-string values must be separated 
by one or more white space characters. In the case of multiple stdng values, the individual 
string values must be separated by line breaks. The line breaks are not included in any value 
^0 content, but all other white space is considered to be part of the value in which it occurs. 

E.g. 



45 
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"tokenl token2 token3" 

"2000-02-19 
2000-11-07" 

«12 34 56 78 90" 

"First string value. 
Second string value." 
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If a string value contains any line breaks, they must be immediadely preceded by a backslash 
V character. The backslash is not included as part of the value content E.g. 

■Here is a string value\ 
with an embedded line break." 

User interfaces which expect single values for particular string properties may choose to 
nup line breaks in user input to spaces rather than interpreting the input as a sequence of 
multiple string values. 

5.4 Property Value Range 

For any given property, the set of allowed values for that property may either be bounded or 
unbounded. 

5.4.1 Bounded 

The set of allowed values for the given property is finite and explicitly defined. Some 
property value ranges are bounded by definition, being based on or derived from fixed 
standards (e.g. language, coverage, format, etc.). Most properties wth bounded value 
ranges are tolxn types having a controlled set of allowed values. 

5.4.2 Unbounded 

The set of allowed values for the given property is infinite, though perhaps otherwise 
oonstrained by format or other characteristics as defined for the property value type. 

S.S Property Value Ranking 

For any given property, the set of allowed values for that property may be ordered by an 
implicit or explicit ordinal ranking, either presumed by all applications operating on or 
referencing those values or defined explicitly in the schema declaration of those values. 
Some property value types are ranked implicitly due to their type and subsequently the 
value ranges of all properties of such types are automatically ranked (e.g. Integer, Count, 
Date, Time, etc.). Most properties with ranked value ranges are token types having a 
controlled set of allowed values which have a significant sequential .ordering (e.g. status, 
release, milestone, etc.). 

Ranking may cither be strict or partial. With strict ranking, no two values for a given 
property may share the same ranking. With partial ranking, muldple values may share the 
same rank, or may be unspecified for rank, having the implicit default rank of zero. 

- Ranked properties may only have single values. This is a special constraint which follows 
logically from the fact that ranking defines a relationship between objects having ranked 
values, and comparisons between ranked values becomes potentially ambiguous if multiple 
values are allowed. E.g. if the values x, y. and 2 for property P have the ranking I, 2, and 3 
respectively, and object 'foo* has the property P(y) and object bar' has the property P(x,z), 
then a boolean query such as "foo.P < bar.??" cannot be resolved to a single boolean result, 
as y is both less than z and greater than x, and thus the query is both true and false, 
depending on which value is chosen for bar.P (i.e. foo.P(y) < bar.P(x) = False, while 
foo.P(y) < bar.P(z) = True). 
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Ranking for all property types other than token are defined implicitly by the data type, 
usually conforming to fundamental mathematical or industry standard convendons. 
Ranking for token property values are specified using Ranking values as defined in section 
5.2,9. 

S.5.1 Strict 

The set of allowed vahiea for the given property corresponds to a strict ordering, and each 
value is associated with a unique ranking within that ordering. 



5.5.2 Partial 

The set of allowed values for the given property corresponds to a partial ordering, and each 
« value is associated with a ranking within that ordering, defaulting to zero if not otherwise 

spedfied. 

5.5.3 None 

20 The set of allowed values for the given property corresponds to a free ordering, and any 

raixking specified for any value is disregarded. 
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6 Metadata Properties 

MARS is made up of sets of metadata properties grouped into modules. Each module 
^ corresponds to a particular funcdon or purpose which Che properties contained in that 

module share. Modules are an organizational convenience and do not have any significance 
to any of the processes or applications oprnting on MARS compliant metadata. 
Applications are not expected to know of, nor required to provide any behavior relating to 
modules. Note that modules do not represent individual namespaces or scopes; and thus no 
10 two modules may have properties wi± the same name. 

MARS specifies a set of core properties which are common to all processes and tools 
operating vathin the Metia Framework, bodi for docomentadon production as wdl as 
distribution. Addidonal properties can be defined and used as required by particular 
processes or needs, and die methods used for defining, encoding, and validating metadata 
support flexible extensibility of the metadata vocabulary. 

Nearly all properties are persistent, meaning that they are intended to be defined and stored 
in Sonne explicit encoding. Some properties, however, are not persistent, but are used only 
for communication between software con^nents operating within the Metia Framework. 
In particular is the property 'action* which specifies what op^ation is to be performed by the 
agent reed ving a particular MARS encoded query. 

In the sections that follow, metadata properties whose values may be envirotimeat 
d^ndent are marked with an asterisk **• and metadata properties which may not always be 
persistent are maxked vdth a section symbol 



61 Identity 

The properties defined in the Identity module are die heart of the MARS metadata model. 

30 * 

As the module name implies, these properties are use to encode the unique identity of data 
entities, both abstract and concrete. The identity properties are scoping, meaning tiiat tiiey 
define a hierarchy of levels, conesponding to Media Object, Instance, Component, and Item 
(see diagram below). 

35 - The "identifier^* property identifies an abstract media object. 

The four properties •'release", "language", "coverage", and "encoding" togetiier, along with 
tiie 'Identifier" property, identify an abstract media instance. 

The "component" property, together with the higher scoped properties, identifies an abstract 
40 media component. 

The "item" property, together widi the higher scoped properties, identifies a concrete 
storage item. 

It is important to note that the Identity properties differ from all other properties in tiiat 
45 some value is required in order to fully identify any discrete body of data. Tools operating 

on MARS metadata arc permitted to presume that the specified default values are valid if no 
other value is provided. 

30 
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Filenames, URLs, and other system specific means of identification are typically fragile, 
frequently non-portable, and do not necessarily follow any formal model or methodology, 
hampering interoperabflity between disparate systems. Using sets of standard metadata 
properties such as those defined in .the MARS Identity module provides a platform^ system, 
and process independent means of defining the identity of documentation entities. It also 
allows systems to operate on one or more levels of scope, such as media object or instance, 
nang user and/or enviroimient information to resolve abstract references to physical data 
items. 

Identity properties may only have Single values. This is a special constraint and follows 
logically from the fact that if multiple values are allowed, there is no way to ensure that the 
same values are always used or that new vsdues are not added, essentially changing the 
identity of the data. To change an Identity value is to change the data's identity- It is similar 
in effect to changing a filename in a file system. 

6.1.1 identifier* 

The unique identifier of an abstract media object 

Name identifier 

Lab el Media Object Identifier 

Type ID 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid ID value as defined by this specification. 
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6.1.2 release * 

The numeric, sequential identifier for a published version of a media instance which is 
5 maintained aod/or distributed in parallel to other releases. 

Name release 

Label Release 

Type Count 

Count Single 

Range Unbounded 

Ranking Strict 

Values Any valid Count value as defined by this specification. 

^3 Default 0 

The date is the numeric, sequential identifier of the independently managed release. Release 
values thus both differentiate between and also order different releases over dme. A release 
with value 7* is considered Co contain more current infonnadon than a release of the same 
^ media object wift value '4'. 

Release values may typically coincide with (synchronize to) major version branch numbers 
in a revision control system, corresponding to version branches directly connected to the 
trunk; thou^ this is not a requirement of MARS. 
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6J13 language 

The primary language in which the data is written. 

Name language 
Label Language 
Type Token 
Count Single 
Range Bounded 
Ranking None 

Values The token value 'none*, or any ISO 639 two-letter language code. 
Default none 

Because some graphics, photos, or other data may contain no textual information and are 
undefined with regards to language, the default language value is *none*. 

See Appendix 9.1 for a complete listing of allowed ISO 639 values. 
6.1.3-1 none 

The data is unspecified for language (presumably because it contains no textual content). 

Name none 
Label None 
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6.1A coverage * 

The geopolitical or application scope of the data, particularly relating to standards, policies, 
units of measure and other regional aspects. 

Name coverage 
Label Coverage 
Type Token 
Count Single 
Range Bounded 
Ranking None 

Values One of: global, europe, north^america, south^america, africa, raiddle^east, 
asiajacific, any ISO 3166-1 two-letter country code, or any valid Token value 
as defined by this specification. 
Default global 

All ISO 3166-1 codes must be entered in lowercase to comply with the constraints of the 
MARS Token forxnaL ISO 3166-1 itself does not specify case as being significant, thus all 
lowercase encoded values used in MARS metadata are fully compliant with ISO 3166-1. 

Custom token values for the coverage property, such as those defining the scope of a 
particular customer or application, may not supersede the semantics of either the values 

25 defined by this specification nor the ISO 3166-1 country codes. I.e., it is not permitted to 

define a custom value which has identical coverage to a MARS defined value, such as 
Vorld' as a synonym for 'gjobal' or 'france' as a synonym for *fr\ etc. The creation of ad-hoc 
coverage scopes from existing defined scopes as a means of documenting current 
application rather than overall relevance (e.g. *fr_ge* for France plus Germany rather than 

30 *europe*) is highly discouraged. In general practice, one should use great constraint before 

defining a new coverage value. 

See Appendix 9.2 for acomplete listing of allowed ISO 3166-1 values. 

35 6.1.4.1 global 

Coverage is world-wide. 
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Name global 
Label Global 



6.1.4.2 europe 

Coverage applies only to Western, Northern, Southern, and Eastern Europe. 

Name europe 
Label Europe 

6.1.4.3 north„america 

Coverage applies only to the United States. Canada, and Mexico. 
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Name north^america 
Label North America 

5 

6.1.4.4 soutb.atnerica 

Coverage applies only to Central and South America, and the Caribbean. 

10 

Name south_america 
Label South America 

IS 

6.1.4.5 africa 

Coverage applies only to Africa. 

2^ Name aftica 

Label Africa 

6.1.4.6 middle_east 

25 

Coverage applies only to the Middle East. 

Name middle_east 
Label Middle East 

30 

6. 1 .4.7 asia^acific 

Coverage applies only to Asia and the Pacific. 

35 

Name asia^pacific 
Label Asia-Pacific 



40 

6.1.5 encoding* 

The syntactic and semantic encoding of the data. 



45 


Name 


encoding 




Label 


Media Encoding 




Type 


Encoding 




Count 


Single 


50 


Range 


Bounded 




Ranking 


None 




Values 


Either binary or any valid Encoding as defined by this specification. 




Default 


binary 
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6.1.5.1 binary 

Data has literal binary encoding which is not expected to be parsed in any fashion. 

Name binary 

Label Literal Binary Encoding 

, Content Type appUcation/octet-stream 

S*#x bin 



^.1.6 component* 

The abstract component of a mecK a object or media instance. 

Name component 

Label Component 

Type Token 

Count Single 

Range Bounded 

Ranking None ^ ^ j , i 

Values One of. data, meta. toe, index, glossary; or other defmed token value. 

Default data 

Typically, components bdong to a media instance, though components can also be d^ned 
for an abstract media object itself, defining properties and other characteristics shared by all 
instances of that media object. 

6.1.6.1 data 

Represents fee data contMit component. 

Name data 

Label Data Content 



20 



25 



30 



40 



45 



50 



6.1.6.2 meta 

Represents the metadata component 

Najne meta 
Label Metadata 



6.1.6.3 toe 

Represents the table of contents component. 

Natne toe 

Label Table of Contents 
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6.1.6.4 index 

Represents the index component 



5 



Nofne 
Label 



index 
Index 



10 



6.1.6.5 glossary 

Represents the glossary component. 



15 



Name 
Label 



glossary 
Glossary 



6,l-7 item* 

The concrete, physical item belonging to a media component. 

Name item 
Label Item 
Type Token 
Count Single 
Range Bounded 
Ranking None 

Values One of: data, meta, idmap, or lock. 
Default data 

Most item property values are significant only to the Generalized Media Archive. In nearly 
all cases, end users will never specify nor concern themselves with item property values 
directly, but will interact primarily with components. 

6.1.7.1 data 

Contains the actual data content of the component 

Name data 

Label Data Content 



6.1.7.2 meta 

Management metadata for the data item of the same component. 

Name meta 
Label Metadata 
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10 



20 



25 



30 



6.1.7.3 idmap 

Symbolic ID pointer to content fragment mapping table. 

Ncune idmap 

Label ID Pointer to Fragment Map 

This item is mandatory for each data item which has statically partitioned data containing 
internal cross reference targets and defines a mapping from each symbolic XPointer 
reference to the number of the fragment containing tiiat target (e.g. "#xyz" "123"), 

6.1.7.4 lock 

Marker preventing acddental collisions between concurrent management systems or 
sessions. 

Name lock 

Label Modification Lock 

The format and nature of the lock item is dependent on the GMA managing the component. 
6.2 Item Qualifier 
6.2.1 pointer* 

A reference to a particular structural element or sequence of elements within the data 
content, encoded as an XPointer string. Typically a pomter to an clement ID value (e.g. 
"«BrD3828l'*)- 



Name pointer 

Label Content Pointer 

35 l^c String 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid XPointer reference string. 



40 



45 



6,2.2 revision 

The number of a particular editorial revision milestone for the release. 



Name revision 

Label Editorial Revision 

Type Count 

Count Single 

so Range Unbounded 

Ranking Strict 

Values Any valid Count value as defined in this specification. 
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6.2.3 fragment 

The number of a specific, static, linear sub-sequeace of the data content of the component. 

^ Name fragment 

Label Data Content Fragment 

Type Count 

Cotint Single 

10 Range Unbounded 

Ranking Strict 

Values Any valid Count value as defined in this specification. 

IS 



20 



25 



30 



35 



40 



45 



SO 
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6,3 Management 

The ptoperdcs defined within the Management module relate to the control of processes 
* operating on or directed by MARS metadata, such as retrieval, storage, change management 

(also lefcned to as version management), etc.. It docs not include metadata properties 
which might be needed for odier higher level management processes such as woikflow 
management, package/configuration management, or editorial process Ufecycle 
management Such processes can be built on top of the functionality provided by tins and 
otkei modules. 

6.3.1 actioa § 

The action or operation wluch a particular Metia Framework Agent is to perform. 

15 

Name action 

Label Action 

Type Token 

Count Multiple 

^ Range Bounded 

Ranking None 

Values One of: store, retrieve, generate, remove, qualify, locate, lock, or unlock. 



25 



A software application must assume default values for unspecified Identity properties as 
defined by this standard, and/or to apply values based^ on user and/or environment 
configurations, in order to resolve any given query to a physical item. 
Multiple actions can be specified at any given time, in which case tfaey are to be applied in 
tiie order specified to tiie data resulting from any preceeding actions, or otherwise to die 
originally specified data. 

This permits tiie convenient specification of compound actions such as 'generate store*, lock 
retrieve', 'store unlock', or locate remove', 

35 6.3,1.1 store 

Store a data stream, associating it v^tii tiie item defined by tiie Identity property values 
otherwise provided in the same query. 



40 



50 



55 



Name store 
Label Store Data 



6.3.1.2 retrieve 

Retrieve the data stream associated with the item defined by the Identity property values 
otherwise provided in the same query. 

Nanie retrieve 
Label Retrieve Data 
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6.3.13 generate 

Generate a new data streann, possibly derived from an input data stream, associating it with 
the item defined by the Identity property values otherwise provided in the same query. 

Name generate 
Label Generate Data 



6-3.1.4 remove 

Remove (delete/destroy) the data defined by the Identity property values otherwise provided 
in the same query. 

Name remove 
Label Remove Data 



6.3.1.5 qualify 

Return a boolean value indicating the existence, validity, or other status of the data defined 
by tiie Identity property values otherwise provided in the same query. 

Name qualify 
Label Qualify Data 



30 63.1.6 locate 

Returu one or more complete item property value sets for all items matching in some 
fashion the set of properties provided in die query. 

35 Name locate 

Label Locate Data 



6.3.1.7 lock 

Set the modification lock for the item defined by the Identity property values odierwise 
provided in the same query. 

Name lock 

Label Set Modification Lock 



63.1.8 unlock 

Release the modification lock for the item defined by tiie Identity property values otherwise 
provided in the same query. 

Name u nlock 
55 Label Release Modification Lock 
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6,3,2 agemgr* 

The CXjI URL prefix to the Medalpramework Agency where the data resides; typically to a 
5 Generalized Media Archive, 

Name agency 

Label Agency CGI URL 

Type Agency 

10 Count Single 

Range Unbounded 

Ranking None 

Values Any valid Agency value as defined by this specificadon. 



IS 



20 



25 



30 



35 



45 



6.3.3 location* 

A URL from wluch Che data can be retrieved; typically a combination of the agency CGI 
prefix, the action "retrieve', and the Identity properties of the data. 

Name location 

Label Location 

Type URL 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid URL value as defined by this specification- 



6*3.4 size 

The total number of bytes of data. Can be used as a simple checksum for data transfers or 
other operations. 



Name size 

Label Size 

T^pe Count 

Count Single 

40 Range Unbounded 

Ranking Strict 

Values Any valid Count value as defined by this specificadon. 



6.3.5 relevance § 

The relevance of tiie data with regards to the ideal target of a search query or similar form 
of comparison to other data. A value of zero indicates no relevance. A value of 100 
indicates full relevance or a "perfect match". 

Name relevance 
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10 



IS 



20 



25 



Label Relevance 

Type Percentage 

Count Single 

Range Bounded 

Ranking Strict 

Values Any valid Percentage value as defined by diis specification. 

The relevance property is used almost exclusively as a transient value whenever a score or 
. other proximity value must be specified in relation to a search query or other similar 
operation. It is not intended to be stored persistently, as its meaning is highly contextual and 
typically valid only within the scope of the results from a particular action by an agent. 



6.3.6 status 



The general lifecycle status of the data; typically indicating the maturity of the content and 
controlling release to specific audiences. 

Name status 

I^bel Status 

T>pe Token 

Count Single 

Range Bounded 

Ranking Strict 

Values One of: draft, approved, or expired. 



30 



35 



6.3.6.1 draft 

The content either has not been created yet or is currently being created or modified and is 
not likely to be fully valid for its intended purpose. 



Name 
Label 
Rank 



draft 
Draft 
1 



40 



45 



6.3.6.2 approved 

The content has been verified as conect and valid for its intended purpose. 

ftame approved 
Label Approved 
Rank 2 



50 
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6,3.6.3 expired 

The content is no longer valid for its intended purpose and/or is no longer maintdned. 
Name expired 
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Label 
Rank 



Expired 
3 



6 J-7 access * 



10 



IS 



20 



25 



30 



Corresponds to one or more nser and/or group identifiers specifying users having rights to 
modify content 

Name access 

Label Access 

Type String 

Count Multiple 

Range Unbounded 

^lu2^^ Any^valid String value as defined by this specificadon, and which conforms to 
the access control mechanisms in use in the given environment. 



6.3.8 revision* 

The sequential editorial milestone identifier for a paidcular revision of the data item of a 
media component, incremented vdtfa each store action fbUowing modifications to the data 
content. 

Name revision 
Label Revision 
Type Count 
Count Single 
Range Unbounded 

Ranking Strict . 
Values Any valid Count value as defined by tms specification. 



40 



so 



ss 



6.3.9 comment § 

A note or comment documenting an operation performed on the data (e.g. the change note 
for a given modificadon). 

Name comment 

Label Comment 

-Type String 

Count Single 

Range Unbounded 

Ranking None , 
Values Any valid String value as defined by this specification. 



6.3.10 tool* 

A full descripdon of the name and version of the tool used to create or last modify the data. 
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Name tool 

Label Tool Description 

Type String 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid String value as defined by this specification. 



6.3.11 created 

The time when the data was first created. 



IS 



Name created 

Label Time Created 

Type Time 

20 Count Single 

Range Unbounded 

Ranking Strict 

Values Any valid Time value as defined by this specificatiOTL 

25 

6.3.12 locked 

The time when the data was locked. 

Name locked 
Label Time Locked 
Type Time 
Count Single 
35 Range Unbounded 

Ranking Strict 

Values Any valid Time value as defined by this specification. 

40 

6.3.13 modified 

The time when the data was last modified. 

45 Name modified 

Label Time Last Modified 

Type Time 

Count Single 

^ Range Unbounded 

Ranking Strict 

Values Any valid Time value as defined by this specification. 
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6.3*14 approved 

The time when the data was approved. 

Name approved 

Label Time Approved 

Type Time 

Count Single 

Range Unbounded 

Ranking Strict 

Values Any valid Time value as defined by this specification. 



6.3.15 reviewed 

The time when the data was last reviewed. 

Name reviewed 

Lab el Time Last Reviewed 

Type Time 

Count Sin^e 

25 Range Unbounded 

Ranking Strict 

Values Any valid Time value as defined by this specification. 



^ 6.3.16 validated 

The time when the data was last validated. 

Name validated 

^ Label Tune Last Validated 

Type Time 

Count Single 

Range Unbounded 

40 Ranking Strict 

Values Any valid Time value as defined by this specification. 



45 63.17 startjov 

The date after which the content is valid. 

Name staitjov 

so Label Start of Period of Validity 

Type Date 

Count Single 

Range Unbounded 

Ranking Strict 

Values Any valid Date value as defined by this specification. 
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6.318 endjpov 

The date up to which the content is valid. 

Name endjov 

Label End of Period of Validity 

Type Date 

Coant Single 

Range Unbounded 

Rankiiig Strict 

Values Any valid Date value as defined by this spedflcation. 



63.19 expiration 

The date after which the data no longer need be stored or managed and can be discarded 
(after optional archival). 

20 

Name expiradon 
Label Expiration Date 
Type Date 
Count Single 
25 Range Unbounded 

Ranking Strict 

Values Any valid Date value as defined by this specification. 



30 



6.3.20 mm§ 



A Media Resource Name (MRN) derived from the set of Identity and Qualifier properties as 
defined by this specification. 



35 



40 



Name mm 

Label Media Resource Name 

Type MRN 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid MRN value as defmed in this specificadon. 



Values for the *mrn' property arc typically not stored statically with the property set of i 
* given object or instance, but are a convenience mechanism used by particular Meti< 

Framework agents for internally defining and referencing storage items via single strin< 
index keys. 

If an MRN value is stored in any fashion by any Agency, it is the responsibility of th? 
50 Agency to maintain absolute synchronization between the MRN value and all of il 

component values from which the MRN is derived. 
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6.4 Affiliation 

Affiliation properties define the organizational environment or scope where data is created 
5 and maintained. 

6.4.1 function 

The business fianction primarily responsible for the creation, validation, and maintenance of 
10 the data content 

Name function 
Label Business Funcdon 
Type Token 
Count Single 
Range Bounded 
Ranking None 

Values One of: management, finance, sales, marketing, research_and_developnient, 
humanj^ources, legal, intsllectual_propcrtyjigbts, purchasing, sourcing, 
production, manufacturing^technology, quality, information^management, 
logistics, customer_service, or business^administration, or 
business^anagement. 



20 



25 



30 



40 



so 



55 



6.4.1.1 finance 

Name finance 
Label Finance 



6.4.1.2 sales 

Name sales 
35 Label Sales 



6.4.1.3 marketing 

Name marketing 
Label Marketing 



6.4.1.4 research_and_developmcnt 

Name research_and_developmcnt 
Label Research and Development 



6.4. 1 .5 human .resources 
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10 



15 



20 



25 



35 



40 



Name human_resources 
Label Human Resources 



6.4,1.6 legal 



Name legal 
Label Legal 



6.4, 1 J intellectuaLproperty^rights 

Name intellectual_property_jights 
Label lotellectual Propercy Rights 



6.4.1.8 purchasing 

Name purchasing 
Label Purchasing 



6.4.1.9 sourcing 



30 Name sourcing 

Label Sourcing 



6.4.1.10 production 

Name production 
Label Production 



6.4. 1.11 nxanufactutingjtechnology 

Name manufacturingjtechnology 
45 Label Manufacturing Technology 



6.4.1.12 qusdity 

so 



Name quality 
Label Quality 



6.4.1 . 13 information_management 



77 



i epatent.com/Loflln.dog/$exam.supportff=^etch/EP00i244 0 32.cpc?ft^ Pa^ e 78 of 161 



EP 1 244 032 A1 



Name infonnation_Tnanagement 
Label Infonnation Management 



6.4.1.14 logistics 

Name logistics 
Label Logistics 



6.4.1.15 customer_service 

Name customer_service 
Label Customer Service 

so 

6.4. 1.16 business^admiiustration 

25 Name business.administration 

Label Business Administration 



6.4,2 organization * 

30 

The top-level organization to which the data belongs. 

Name organization 

Label Organization 

^ Type Token 

Count Single 

Range Bounded 

Ranking None 

40 Values Any valid Token value as defined by this specification. 



4S 



SO 



SS 



6.4,3 business_unit ♦ 

The business unit to which the data belongs. 

Name business_unit 

Label Business Unit 

Type Token 

Count Multiple 

Range Bounded 

Ranking None 

Values Any valid Token value as defined by this specification. 
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The values for this property must be defined separately by each individual organization for 
all business units v/ithin that organization. 



10 



IS 



6.4.4 product JFamily * 

The product family to which the data belongs. 

Name producL/amily 

Label Product Family 

Type Token 

Count Multiple 

Range Bounded 

Ranking None 

Values Any valid Token value as defined by this spedfication. 



The values for this property must be defined separately by each individual organizaJtion or 
business unit for all product families within that organization and/or business unit. 

20 

6,4.5 product* 

The product to which the data belongs* 



25 Name product 

Label Product 

Type Token 

Count Multiple 

Range Bounded 

30 Ranking None 

Values Any valid Token value as defined by this specificadon. 



The values for this property must be defined separately by each individual organization, 
business unit, or product line for all products vnthin that organization, business unit, and/or 
product line. 



40 



45 



6A,€ product^release * 

The product release to which the data belongs. 

Name product_release 

Label Product Release 

Type Token 

Count Multiple 

Range Bounded 

Ranking Strict 

Values Any valid Token value as defined by this specification. 



so 



The values for this property must be defined separately by each individual organization 
business unit, or product line for all product releases within a given product. 
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6,4.7 project * 

The project to which the data belongs. 

5 

Name project 

Label Project 

Type Token 

Count Multiple 

10 Range Bounded 

Ranking None 

Values Any valid Token value as defined by this specification, 

IS The values for this property must be defined separately by each individual organization, 

business unit, or product line for all projects within that organization, business umt, and/or 
product line. 



20 



45 



50 
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6.4.8 process"^ 

The process to which the data belongs. 



Name process 
Label Process 
2s Type Token 

Count Multiple 
Range Bounded 
Ranking None 

Values Any valid Token value as defined by tfiis specification. 

The values for this property must be defined separately by each individual organization, 
buaness unit, or product Hne for aU processes vdthin that organization, business unit, and/or 
product line. 

35 

6A.9 milestone* 

A symbolic milestone with which the data is associated. 

^0 Name milestone 

Label Milestone 

Type Token 

Count Multiple 

Range Bounded 

Ranking Strict 

Values Any valid Token value as defined by this specification. 



The values for this property must be defined separately by each individual organizatio 
business unit, or product line for all processes within that organization, business unit, and/- 
product line. 
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6.S Content 

Content properties define characteristics about data, often irrespective of its production, 
a^plicadon, or realization. 

6.5.1 publisher 

The entity responsible for making the dau available. Typically the organization owning the 
data. 



Count Single 

Range Unbounded 

Ranking None 

Values Any valid String value as defined by tins specification. 



6£.2 rights 

Information about rights held in and over the data. Typically a copyright notice. 

Name rights 

Label Rights 

T^e String 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid String value as defined by this specification. 



6«S.3 cojifidentiality 

The level of permitted access to the data. 

Name confidentiality 

Label Confidentiality 

Type Token 

Count Single 

Range Bounded 

Ranking Strict 

Values One of: public, company, confidential, or secret. 

6.5.3.1 public 

Access to the data is unrestricted. 

Name public 
Label Public 




EP 1 244 032 A1 



Name 
Label 

Type 



publisher 
Publisher 
String 
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Rank 1 



6.5.3.2 company 

Access to the data is restricted to company personnel. 

Name company 

Label Company Confidential 

Rank 2 



6.5.3.3 confidential 

Access to the data is restricted to those who are entitled by virtue of their duties. 

Name confidential 
20 Label Confidential 

Rank 3 



6.5.3.4 secret 

Access to the data is restricted to the owner and to individuals named by the owner. 

Name secret 
Label Secret 
Rank 4 



6.5.4 title 

The name given to the data, usually by the creator. 

title 
Title 
String 
Single 
Unbounded 
None 

Any valid String value as defined by this specification. 



Name 

Label 

Type 

Count 

Range 

Ranking 

Values 



6.5.5 description 

A textual description of the data content. 

Name description 

Label Description 

Type String 

Count Single 
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Range Unbounded 
Ranking None 

Values Any valid String value as defined by this specification. 



6.5.6 type 

The content type represented by the data. 

Name type 

Lab el Content Type 

IS Type Token 

Count Single 

Range Bounded 

Ranking None 

Values One of: general, product, project, process, management, or business. 



20 



25 



30 



6.5.6.1 general 

Content is used for general purposes. 

Name general 

Label General Content 



6.S.6.2 product 

Content is used for product related purposes. 

35 Name product 

Label Product Related Content 



40 6.S.6.3 project 

Content is used for project related purposes. 

- Name project 
45 Label Project Related Content 



6.5.6.4 process 

^ Content is used for process related purposes. 

Name process 

Label Process Related Content 

55 
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6.5.6.5 management 
Content is used for management related purposes. 

Name management 
Label Management Related Content 

6.5.6.6 business 

Content is used for business related purposes. 

Name business 
15 Label Business Related Content 

6.5.7 cOass* 

20 One or more topical, scope, typing, application, or odier classificatory identifiers. 

Name class 

Label Classification 

Type Token 

25 Count Multiple 

Range Bounded 

Ranking None 

Values Any valid Token value as defined by this specification. 



30 



35 



40 



45 



The values for this property must be defined separately by each individual organization, 
business unit, or product line in accordance with their classification needs* 

6.5.8 keywords* 

One or more keywords (or terms or phrases) used to classify the general content of the data. 

Name keywords 

Label Keywords 

Type String 

Count Multiple 

Range Unbounded 

Ranking None 

"Values Any valid String value as defined by tiiis specification. 



This property is intended to be used when the values defined for the 'class* property arc not 
fully sufficient for the classification needed or when classification must be based on 
identifiers which are not valid Tokens. Care should be taken to ensure that it is not used in 
50 lieu of the 'class* property when the latter property offers one or more suitable values. 
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6.6 Encoding 

Encoding properties define special qualities relating to the format, structure, or general 
5 serialization of data streams which are significant to tools and processes operating on that 

data. 

6.6.1 content_type ♦ 
10 TheMIME content type of the data. 



Name 
Label 
Type 

/5 Count 
Range 
Ranking 
Values 
Default 

20 

The defaxilt MIME content type value corresponds to an otherwise unspecified stream of 
binary data, and coincides with the default values for the 'encocUngf and 'suffix' properties. 
See Appendix 9.3 for a listing of the most commonly used MIME content type values. 

25 

^6.2 sufOx'^ 

The filename suffix associated with a particular encoding. 



content^type 

MIME Content Type 

String 

Single 

Bounded 

None 

Any valid MIME content type value, 
"application/octet-stream" 



Name suffix 
Label Hlenanie Suffix 
Type String 
Count Single 
Range Unbounded 
Ranking None 

^ Values Any valid String value as defined in this specification. 

Default "bin- 

The default suffix value coixesponds to an otherwise unspecified stream of binary data, and 
40 coincides with the default values for the 'encoding* and 'nume' properties. 

6,6.3 schema* 

' The identifier for a DTD, XML Schema, or other like mechanism defining the 
45 syntactic/stnictural model of the data (if any). 

Name schema 

Label Schema 

Type String 

Count Single 

Range Unbounded 

Ranking None 
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10 



20 



30 



40 



Values Any valid String value as defined by this specification. 

The structure and interpretation of schema string values is environment and system 
dependent 



6.6-4 aspect 



Selection criteria for inclusion of the data witfiin a given context, process, scope, or other 
conditional application. 



Name aspect 

Label Aspect 

15 Type String 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid String value as defined by this specification. 



Aspect values are typically defined within structured document instances and seldom stored 
as persistent metadata externally. 

^5 6.6.5 character_set 

The MIME character set identifier for the primary or base character set in which textual 
content is encoded . 



Name characterjset 

Label MIME Character Set 

Type String 

Count Single 

Range Bounded 

3s Ranking None 

Values Any valid MIME character set identifier. 



6,6.6 line_deliraiter 

The line delimiter character or character sequence for textual content. 



Name line_delimiter 

" Label Line Delimiter 

^ Type Token 

Count Single 

Range Bounded 

Ranking None 

Values One of If, cr, crlf, or any valid Token value as defined by this specification. 

so 
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6.6.6.1 If 

Lines of content are delimited by line feed 00 characters (also called newline characters). 
5 This is the line delimiution method for Unix, Linux, Windows NT/2000, and most POSIX 

compliant operating systems. 

Name If 
Label Line Feed 

10 

6.6.6.2 cr 

Lines of content are delimited by carriage return (cr) characters. This is the line delimitadon 
method for the Macintosh operating system. 

IS 

Name cr 

Label Carriage Return 

20 6.6.6.3 crlf 

Lines of content are delimited by an ordered adjacent pair of carriage return and line feed 
characters. This is the method for MS-DOS and Windows 95/98 operating systems. 

Name crlf 

Label Carriage Return + Line Feed 



6,6.7 width^injcnillimeters 

Absolute width dimension in millimeters. 

Name wid(bjn_|nillimetcrs 

Label Width in Millimeters 

Type Count 

Ck)uat Single 

Range Unbounded 

Ranking Strict 

Values Any valid Count value as defined by this specification. 



6,6.8 lieightJn_miUimeters 

Absolute height dimension in millimeters. 

45 

Name height Jn^millimeters 
Label Height in Millimeters 
Type Count 
so Count Single 

Range Unbounded 
Ranking Strict 

Values Any valid Count value as defined by this specification. 
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10 



6.63 widtfajiupixels 

Absolute width dimension in pixels. 

Name widlhjrupixels 

Label Width in Pixels 

Type Count 

Count Single 

Range Unbounded 

Rajoking Strict 

Values Any valid Coiint value as defined by this specification. 



IS 



20 



2S 



6.6.10 heightjn_pixels 

Absolute height dimension in pixels. 

Name heightjn_pixels 

Label Height in Pixels 

Type Count 

Count Single 

Range Unbounded 

Ranking Strict 

Values Any valid Count value as defined by tMs specification. 



40 



6,6Jll resolution 

30 

Resolution of an image or the desired rendezing resolution in dots per inch (dpi) for 
graphical data encodings. 

Name resolution 

35 Label Resolution (dpi) 

Type Count 

Count Single 

Range Unbounded 

Ranking Strict 

Values Any valid Count value as defined by this specification. 



6,6.12 -compression 

The method used for compression of graphical data encodings. 

Name compression 

Label Compression 

Type Token 

Count Single 

Range Bounded 

Ranking None 

Values Any valid Token value as defined by this specification. 



so 
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6,6.13 color^depth 

5 The total number of bits per pixel (bpp) used to encode individually displayable colors in 

graphical data encodings. 

Name color^depth 

Label Color Depth (bpp) 

10 Type Count 

Count Single 

Range Unbounded 

Ranking Strict 

Values Any valid Count value as defined by this specification. 

6«6.14 color_space 

TTic color space (model) used for graphical data encodings. 

Name colorjspace 

Label Color Space 

Type Token 

Count Single 

Range Unbounded 

Ranking None 

Values One of rgb, rgba, cm^ hsl; or any valid Token value as defined by this 
specification. 

6.6.14.1 igb 
Red/Gteen/Blue (RGB)- 

35 Name rgb 

Label Red/Greeu/Blue (RGB) 

6.6.14.2 rgba 

40 Red/Green/Blue^Alpha (RGBA). 

Name rgba. 

Label Red/Gieen/Blue/Alpha (RGBA) 

6.6.14.3 cmyk 
Cyan/Magenta/Yellow/blacK (CMYK). 

Name cmyk 

Label Cyan/Magenta/YcUow/blacK (CMYK) 



30 



45 



50 
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6.6,14.4 hsl 

Hue/Saturation/Lightness (HSL). 

Name hsl 

Label Hue/Saturation/Lightness (HSL) 



6.7 Association 

Association properties define special relationships reladng to the origin, scope, and/or focus 
of the content in reference to other data. Values may be any valid URI. though it is 
recommended that wherever possible, MRNs be used. 

6.7,1 source* 

Resource(s) from which the data is derived. 

Name source 

Label Source 

Type -URI 

Count Muldple 

Hange Unbounded 

Ranking None 

Values Any valid URI value as defined by this specification. 



30 6J7J> refers* 

Resource(s) to which the data refers. 

Name refers 

Label Refers To 

^ Type URI 

Count Multiple 

Range Unbounded 

Ranldng None 

40 Values Any vaUd URI value as defined by this specification. 



6.7 supersedes ♦ 
45 Resource(s) which the data supersedes or replaces. 

Name supersedes 

Label Supersedes 

Type URI 

Count Multiple 

Range Unbounded 

Ranking None 

Values Any valid URI value as defined by this specification. 
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6.7 A summarizes^ 

Ilesource(s) which the data summarizes. 

5 

Name sunimanzes 
Label Summarizes 
T^e URI 
Count Multiple 
10 Range Unbounded 

Ranking None 

Values Any valid URI value as defined by this specification. 



15 



20 



25 



6.7.S expands * 

Resource(s) which the data expands. 

Name expands 

Label Expands 

Type URI 

Count Multiple 

Range Unbounded 

Ranking None 

Values Any valid URI value as defined by this specification. 



6.7,6 includes §* 

^ Resource(s) which are included as partial content for the data as a whole- 

Name includes 
Label Includes 
Type URI 
Count Multiple 
Range Unbounded 
Ranking None 

Values Any valid URI value as defined by this specification. 

40 



35 



6,8 Role- 
Role properties specify one or more actors who have a special relationship with the data. An 
actor is usually a person, but can also be a software application. 

6,8.1 user§* 

Identifier of actor performing operation on or cuaentiy having modification rights to data. 
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Name user 

Label User 

Type Actor 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid Actor value as defined by this specification. 



This property value is required to be persistent only when a modification lock is in force. 
Otherwise, it is typically transient for any given operation. 



IS 



20 



25 



€.8.2 creator* 

Identifier of actor who created the original data. 

Name creator 

Label Creator 

Type Actor 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid Actor value as defined by this specification. 



3S 



40 



6.8.3 owner * 

Identifier of actor who has primary rights and responsibilities for the data 

Name owner 

Label Owner 

Type Actor 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid Actor value as defined by this specificadon. 



45 



50 
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6.8.4 modifier* 

'Identifier of actor who last modified the data. 

Name modifier 

Label Modifier 

Type Actor 

Count Single 

Range Unbounded 

Ranking None 

Values Any valid Actor value as defined by this specification. 
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10 



6.8.5 approver^* 

Identifier(s) of actor(s) responsible for the quality and correctness of the data. 

Name approver 

Label Approver 

Type Actor 

Count Multiple 

Range Unbounded 

Ranking None 

Values Any valid Actor value as defined by this specification. 



6.8.6 contributor* 

Identifier(s) of actor(s) having contributed to the data. 

Name contributor 

^ Label Contributor 

l^e Actor 

Coont Multiple 

Range Unbounded 

Ranking None 

^ Values Any valid Actor value as defined by this spedfication- . 



6,8.7 reviewer * 

^ Identifier(s) of actor(s) responsible for evaluating the quality and correctness of the data. 

Name reviewer 
Label Reviewer 
Xype Actor 
^ Count Multiple 

Range Unbounded 
Ranking None 

Values Any valid Actor value as defined by this specificalion. 

40 

6.8*8 distribution* 

Identifier(s) of actOT(s) having a key interest in the data and are typically nodfied in some 
45 fashion regarding changes in Che content or status of the data. 

Name distribution 

Label Distribution 

Type Actor 

so Count Multiple 

Range Unbounded j 

Ranking None 

Values Any valid Actor value as defined by this specification. 
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7 Serialization and Validation 

Because MARS is stricdy a metadata specification framework and vocabulary, there is no 
required method for encoding MARS metadata property values or rules governing their 
validity. However, the Generalized Media Archive (OMA) specification defines a 
serialization for MARS property value sets based on XML which is suitable for both data 
interchange as well as persistent storage, and provides a DTD and oth^ mechanisms for 
validation and processing. 
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8 MRN (Media Resource Name) Syntax 

This specification defines a UFK syntax for MARS item references which is made up of the 
ordered concatenation of Identity properties, and optionally Item Qualifier properties, 
separated by colons. The ordered sequence is identifier, release, language, coverage, 
encoding, component, item, (revision, fragment, pointer]. 

All MRNs share the common fixed prefix ■urn: mars: < in accordance with RFC 2141. 
Note tfiat the case of this piefbc is not significant, but the case of the remainder of the URN 
is significant. Le., «aRN:MARS: », 'urnrroars: », and «UrN:MaRs: ■ are all equivalent. 
It is rcconunended, however, that the prefix be all in lowercase, as shown in the examples, 
for the sake of ocHisistent readability across systems and environments. 

There are two forms of MRN: (1) media instance component items (the typical case), and 
(2) media object component items (for inherited or defining information). 

In addition, eitfier form of MRN may be qualified for revision, fragment, and/or pointer. 

MRNs provide an explicit, concise, unique, consistent, and information rich identity string 
value in cases where such a single identity string is needed. 

MRNs identic only storage items, and not higjier level abstract entities such as 
componeats, instances or objects. Note though, that the Meda Framework Java API 
provides for the notion of an MRN pattern, which can be employed to represent metadata* 
related sets of items defined by valid MRNs. 



25 



30 



8.1 Media Instance Component Item MRN 

A media instance component item MRN is required to have valid property values for every 
Identity property. E.g.: 

«urn:niara :dn823942931891 :2 : en: global : xhtml :meta: data" 
"\im:inarfi : dnB23 942931891 : 2 : f i : f i :neutral_rou : toe : data" 
" urn : mars : tanS 2819:0: none : globa 1 : cgm_2 : data ; data " 
•um:mars:x928bks212_u:ll:ch:asta: word: data :taeta" 



40 



8.2 Media Object Component Item MRN 

Media object component item MRNs all share the same fixed sub-sequence •:♦:♦:*:♦:• 
between die identifier and component property values, and are required to have valid 
property values for every identifier, component and item property. E.g.: 



''um:mar8:dn823 94293 1891 
"um:mars:dn823942931B91 
*'um:mars : tan82819 :*:♦:♦ 



: met a : data" 
:tOC:data*' 



-.data: data*' 



45 



50 



The sequence •:*:*:♦:*:• signifies that the defined items have global scope over all 
instances, regardless of release, language, coverage, or encoding. 

Note that MARS does not define how global information that is defined for media objects is 
to be applied to instances, nor which components may be defined for any given media 
object, nor their interpretation. MARS simply defines how those storage items arc named 
and organized using MARS metadata properties. In a typical environment, the only 
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components defined for media objects would be a meta component for global metadata 
shared by all instances and possibly a data component containing a template or general 
document or abstract defining the content ajid/or structure shared by all instances. 

5 

8.3 Qualified MRN 

A qualified MRN has three additional fields suffixed to an unqualified MRN, corresponding 
to the property values for revision, fragment, and pointer; in that order. If any Qualifier 
property is undefined, its field must contain an asterisk All three fields are mandatory. 
B.g.: 

"um:mars : tan82 819 :0 mone : global :cgm_2 : data : data : 3 : ♦ : 
,5 «xim:tnars:x928bks212_u:ll:ch:asia:Word:data:meta:* :234 : 

«um:mars :dn823942931891 : * : * : * t * j data:data: * : * : #EID2z821« 

Combinations of values for both reviaon and fragment may only be mewnngful if the 
revision number corresponds to the latest revision (in which case the reviaon number is 
20 superfluous) or if die fragment can be reliably regenerated based solely on the fragment 

number, as it is expected tiiat static fragments are typically maintained only for the latest 
revidon* 

ss 
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9 Appendices 



9.1 Language Property Values 

The following table lists all allowed token values for the "language'' property, along with 
their presentation labels, as defined in ISO 639. 



15 
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Name 


Label 


Name 


Label 


tt 


Afar 


Iv 


Latvian, Lettish 


•b 


Abkhazian 


mg 


Malagasy 


af 


Afrikaans 


mi 


Maori 


am 


Ambaric 


mk 


Macedonian 


tr 


Arabic 


ml 


Malayalam 


as 


Assantese 


mn 


Mon^Uao 


ty 


Ayraara 


mo 


Moldavian 


tz 


AzeiboUanS 


mr 


Maradii 


ba 


Bashkir 


ms 


Malay 


be 


Byelorus&an 


mt 


Maltese 


bp 


Bulgarian 


my 


Bonnese 


bh 


Bibari 


na 


Kaurti 


b( 


fiislana 


oe 


Nspali 


bn 


Bengali; Bangla 


nl 


Dutch 


bo 


Tibetan 


no 


Norwegian 


or 


Brctoa 


oc 


Occttan 


ca 


Catalan 


cm 


CAfao) Otomo 


CO 




or 


Onya 


cs 




P* 


FQnjaol 


t^v 
cy 


Welsb 


nl 
P* 


jrpnsn 


da 


Q finish 


ps 


irasnio, rusnio 


de 


Oerman 


Pt 


Portuguese 


dz 


Bhutani 


qu 


Quechua 


d 


Creek 


fm 


Rhaeto-Romance 


en 


English 


m 


KIrundi 


«o 


Esperanto 


ro 


Romanian 


es 


Spanisb 


ru 


Russian 


et 


Estonian 


rw 


Kinyarwanda 


en 


Basque 


sa 


Sanskrit 


fa 


Persian 


sd 


Sindbi 


fl 


Hnnish 


«g 


Sangro 


^' 


Rji 


ih 


Setbo-Croatian 


fo ' 


Faeroese 


si 


Singhalese 


fr 


French 


sk 


Slovak 


fy 


Frisian 


s] 


Slovenian 


?a 


Irish 


sm 


Samoan 


gd 


Scots Gaelic 


571 


Shona 


gl 


Calician 


SO 


Somali 


Sn 


Guarani 




Albanian 
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5 



20 



30 



Name 


Label 


Name 


Label 


gu 


Gujarati 


sr 


Seibian 


ha 


Hausa 


ss 


Siswati 


hi 


Hindi 


St 


Sesotho 


hr 


Croatian 


su 


Sundanese 1 


hu 


Hungarian 


8V 


Swedish 


hv 


Armenian 




SwaMH 


|8l 


Inieilineiia 


ta 


Tamil 


IC 


LitfirlinffUfi 

All AfitKWV 


te 


Teeolu 


IK 


Tnnnink 


te 


Tajik 


in 


inownca 1 oil 


th 


Thai 


15 




ti 


Tigrinya 


It 


Italian 


tk 


Turkiiicn 


iw 




tl 


Tasalofi 




Jap&nc^ 


tn 


Sctswana 




xiocusn 


In 


Tonga 


iw 


Javanese 




Turkish 


ka 


Oeorgian 


«9 


Tsonga 


kk 




tt 


Tatar 


Id 


Qreenlandic 


tw 


Twi 


km 


Cambodian 


uk 


mcralnian 


kn 


Kannada 


ur 


Urdu 


ko 


Korean 


U2 


Uzbek 


ks 


Kashmiri 


vi 


Vietnamese 


ku 


Kurdish 


VO 


Vol^uk 


ky 


Kixshiz 


wo 


Wolof 


la 


Latin 


xh 


Xhosa 


In 


Lingala 


yo 


Yoruba 


lo 


Laotfaian 


zh 


Chinese 


It 


Lithuanian 


zu 


Zulu 



35 



40 



45 



SO 



98 



EPQQ1344Q-j2 [h ttD://www.getthepatent .com / Login.dog/$exam.suppoi1/Fetch/EP001244 032.cpc^ 



Page 



EP 1 244 032 A1 



9JZ Coverage Property Values 

The following table lists the allowed token values for the "coverage" property, adopted from 
ISO 3166-1, along with their presentation labels* 



10 
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Name 



am 



au 



aw 



az 



bb 



bd 



bf 



bg 



bh 



bi 



bm 



bn 



bo 



br 



bs 



bt 



bv 



bw 



bz 



ch 



Andorra 



Label 



United Arab Emirates 



Afghanistan 



Antigua and Baibuda 



Anguilla 



Albania 



Armenia 



Nethexiands Antilles 



Angpla 



Antarctica 



Argentina 



American Samoa 



Austria 



Australia 



Araba 



Ajert?aidjan 



Bosnia-Hectegovina 



Bazbados 



Bangladesh 



Belgium 



Burkina Faso 



Bulgaria 



Bahrain 



Burundi 



Bemn 



Bennuda 



Bninei Darussalam 



Bolivia 



Brazil 



Bahamas 



Bhutan 



Bouvet Island 



Botswana 



Belarus 



Belize 



Canada 



Cocos (Keeiing) Islands 



Name 



Ik 



la 



Iv 



ly 



ma 



Rid 



m£. 



mb 



mk 



ml 



mm 



mp 



mq 



Central African Republic 



Congo 



Switzeriand 



ms 



mt 



mw 



mx 



SSL 



mz 



ne 



net 



nf 



ng 



ni 



nl 



Ivory Coast (Cote Dlvolre) 



Cook Islands 



np 



nt 



Label 



Saint Lucia 



Liechtenstein 



Sri Lanka 



Liberia 



Lesotho 



Lithuania 



Luxembourg 



Latvia 



Libya 



Morocco 



Monaco 



Moldavia 



Madagascar 



Marshall Islands 



Macedonia 



MaU 



Myanmar 



Mongolia 



Macau 



Northern Mariana Islands 



Martinique (French) 



Mauritania 



Mbntsenat 



MUta 



Mauritius 



Maklives 



Malawi 



Mexico 



Malaysia 



Mozambique 



Namibia 



New Caledonia (French) 



Kiger 



Network 



Norfolk Island 



Nigeria 



Nicaragua 



Nedierlands 



Norway 



Nepal 



Nauru 



Neutml Zone 
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9&ine ^ 


Label I 


^ame 1 


^bel 1 


Q 


1 C 


:hile ^ 


u I 


V\\3t 1 




m < 


[Cameroon r 


a I 


^ew Zealand 1 


5 ^ 


n ^ 


China < 


m ( 


Dman 1 






Colombia 1 


» 1 


Panama 1 






Costa Rica 1 


pe ^ 


Peru 1 




rs 


Former Czechoslovakia 


Pf 


Polynesia (French) 1 


10 < 




Cuba 


pg 


Papua New (}uinea 1 




cv 


Pan* VbrIb 


ph 


Philipianes 1 




cx 


OhT4«ffnfl< Tslsnd 


pk 


Pakistan ) 







.Sqeehs 


p1 


Poland 1 




cz 




pm 


Saint FSene and Miquelon t 


15 


de 


Germany 


on 


Pitcaim Island 1 






Djibouti 


or 

-El 


Puerto Rico I 




dk 1 


Denmark 


_ 


Portugal 1 




dm 1 




.-ElL 


Patau 1 




do 1 




nv 


Paraguay | 


ZO 


dz 1 


Algeria 


.31 


Qatar ) 




ec 1 






Reunion (French) ( 




ee 1 




lO 


Romania 1 




eg J 


Bgypt 


ru 


Russian Federation 1 




eh 1 




rw 


Rwanda 1 


25 


ec 


T^icrea 


sa 


Saudi Arabia j 




es 


Sp&ln 


sb 


Solomon Islands 1 




ct 


1 Kthionift 


fiC 


Seycbelles 1 






[ Finland 


sd 


Sudan 1 






1 Fiii 


se 


Sweden 1 


30 


fk 


1 Falkland Islands 


SB 


1 Singapore | 




fm 


1 Mjcrociesia 


sh 


Saint Helena j 




fo 


1 Faroe Islands 


si 


Slovenia 1 




fr 


1 Firance 


«i 


Svalbasd and Jan Mayen Islands I 




fx 


1 France (Euiopeaft Teiritoty) 


sk 


Slovak Republic 1 


35 


g& 


1 Gabon 


si 


Siena Leone 




eb 


1 Great Britain 


sm 


San Marino 




ed 


1 Grenada 


sn 


Senegal 




ge 


1 Geoi^^a 


so 


Somalia 


40 


ef 


1 French Guyana 


sr 


Suriname 




1 Ghana 


st 


Saint Tome (Sao Tome) and Principe 




^1 


1 Gibraltar 


$u 


Former USSR 




Hi 


1 Greenland 


sv 


Bl Salvador 




Km 


1 Gambia 


sy 


Syria 


45 


gn 


1 Guinea 


sz 


Swaziland 







1 Guadeloupe (French) 


tc 


Turks and Caicos Islands 




M 


1 Equatorial Guinea 


td 


Chad 




«r 


1 Greece 


tf 


French Southern Tenritories 




£S 


S. Georgia & S. Sandwich Isls. 


tg 


Topo 


50 




1 Guatemala 


ch 


Thailand 






1 Guam (USA) 




Tadjikistan 
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Name 


Label 


Name 


* u . 1 

Label 






QuitieaBisstu 


tk 


Tokelao 


5 


Wi 


Guyana 


tm 


Turkmenistan 




hk 


HonsXong 


th 


Tunisia 




hm 


Heard and McDon&ld Islands 


to 


Too^a . 




hn 


Honduras 




EastTmior 




hr 


Croatia 


tr 


Tuifcey 


10 


ht 


Kald 


tt 


Trinidad and Tobago 




hu 


Hungary 


tv 


Tuvalu 




id 


Indonesia 


tw 


Taiwan 




16 


Ireland 


tz 


Tanzania 




il 


Israel 


ua 


Ukraine 


15 


in 


India 


ug 


Uganda 




lo 


British Indian Ocean Temtory 


ok 


United Kingdom 




ki 


Iraa 


um 


USA Minor Outlying Islands 




1 "1 


Iran 


us 


United Sutes 




II 


Iceland 


uy 


Uruguay 


SO' 


il 


Italy 


uz 


Uzbekistan 




im 


lamaica 


va 


Vatican State 




J*" 

fo 


Jordan 


vc 


Saint Vincenc Sc. Grenadines 




ID 


Japan 


vc 


Venezuela 






Kenya 


vg 


Virgin Islands (British) 


25 


te 


Kyrgyzstan 


vi 


Virgin Islands CUS A) 




Ich 


Cambodia 


vn 


Vietnam 




U 


Kxxibati 


vu 


Vannicu 




km 


Comoros 


wf 


Wflllis and Futuna Islands 




kn 


Saint Kitts ft Nevis AneulUa 


ws 


Samoa 


30 




Kocth Korea 


ye 


Yemen 




kr 


Sooth Korea 


yt 


Mayotte 




kw 


Kuwait 


yo 


Yogoslavla 




ky 


Cayman Islands 


za 


South Africa 




kz 


Kazakhstan 


zm 


Zambia 


35 


U 


Laos 


zr 


Zaire 




lb 


Lebanon 


zw 


Zimbabwe 
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9.3 MIME Derived Property Values 

The following are the most commonly used MIME content types and character sets which 
5 are expected to be most frequently used; although any valid MIME content type or character 

set is permitted (though not all may be supported by the tools and/or processes of a ©ven 
environment). They are provided here only for convenient reference. 

93.1 Content Types 

10 

«appXication/http" 
••application/msword" 
« application/octet - stream" 
« application/pdf •» 
n appllcation/pos tscr ipt « 
"appXicatioa/rt f « 
"applicatlon/sgml" 
"application/sgml-open-catalog" 
• app 1 i cat ion/ vnd. lotus -no tea" 
20 «»application/vnd,mif • 

« application/ vnd . ms- excel " 
" appl icat ion/vnd , ms -powerpo in t " 
"application/vnd.ms-project" 
wapplication/vnd.visio" 
25 "application/vnd.wap.sic" 

"application/vnd, wap . sic" 
" appl icat ion/ vtid . wap - vrbxml " 
■ application/vnd . wap - wmlc*' 
" appl icat ion/ vnd , wap . wmlscrip tc " 
"application/xml" 
•iroage/cgra" 
" image /gif* 
" image/ jpeg" 
"image/png" 
« image/tiff « 
^ « image/vnd . dwg « 

" itoage/vnd . dxf « 
"model/vrml" 
« text/ess 
« text/enriched" 
^0 « text/html" 

"text /plain" 
"text/rtf • 
"text/sgml" 
"text/uri-list" 
« text/vnd . wap . si 
"text /vnd. wap. si" 
" t ext / vnd . wap . wml " 
" text/vnd. wap . wml script " 
"text/xml" 
so wvideo/mpeg" 

"video/quicktiroe'* 
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9.3.2 Character Sets 

"ua-ascii" 

«iso-8859-a'* 

"utf-8" 

«utf-16 " 

•'gb2312« 

«iso-2022-jp« 

"shiftjis" 

"euc-kr" 



IS 



so 



25 



30 



35 



40 



45 



50 



103 



EP0PJ244032ihtt p://www,getthepate^^ Page ICjA of 161 



EP 1 244 032 A1 

10 Changes from version 1.0 to 2.0 
> Name changed from DORS to MARS 

5 

The name of this specification was changed from Document Object Reference Semantics 
(DORS) to Media Attribution and Reference Semantics (MARS) in conjunction vdth die 
naming changes applied to all components of the Meda (NCDE) Bramewoik. 

10 > Added Item Qualifier concept and Fropexty Module 

Added the concept of an Item Qualifier and created a new property module named Qaalifier 
containing the properties Revision', *fragmenf , and *pointer'. See section 6,2. Removed the 
item token values 'data JfW /', 'revisions', and 'revisionJf#* as these are now handled by 
^5 item qualifiers. 

> Added explicit definition of MARS Versiooing Model 

See section 4.9. 

20 > Added explicit definition of MARS Metadata liiheritance Behavior 

See section 4.8. 

> Release property Type now a ^ple Count value 

25 

Changing the release property to type Count preserves the ability for a system to 
automatically sort releases and obtain the latest release while removing the confusion of 
using Dates — the attraction of dates was tiiat it gave a linear progression value that had 
some relation to real time and actual production lives, but the confusion about the date 
being that of the creation of the release (branch) rather than the "release date" of the final 
approved version seemed too problematic to resolve on the large scale MARS is intended 
for. 

The specification of release as a Count value also is closer to many traditions of product or 
system release (e.g. T9. TIO, Java 2, DOM 1, DOM 2. COM 1^2/3/4, etc.) where the 
editorial version is a separate property from the release identifier — as is now die case with 
MARS. 

> New 'encoding' property; previous 'format' property renamed to 'content_type' 

While adopting MIME encoding strings as the value for encoding (format) properties 
seemed a good idea, both because we were adopting an existing standard as well as giving 
high status to a property that plays a central role in a distributed Web environment — it is 
clear that the level of resolution provided by MIME encoding values is insufficient and that 
rather than append additional information to the MIME string, increasing the processing 
burden, or adding yet another Identity property, it seemed best to revert back to the original 
model for encoding (format) properties as symbolic tokens defined for a given environment 
for those encodings which are significant to the production and management processes and 
needs of that environment. 

It is not enough to say "text/xml" or "image/cgm". We need to differentiate between 
different instances of a media object which all share the same MIME type but have different 
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Specific encodings, such as Neutral-MU, Online. DocBook. COM 1. CGM 3. etc. Likewise, 
some encodings have a very broad range of possibilities where we wish to limit to only a 
few options, such as TIFF for low, medium, and high resolution, or GIF at 72 and 600 dpi. 
etc. 

Symbolic token values thus allow for defining sets of encoding and format properties in a 
single value which is significant for defining the identity of an instance; eg. neutraljmu. 
cgmJ3. tiff Jew, gif89E^600dpi. etc. 

The name of the property was changed from 'format' to 'encoding* both to be more accurate 
(the property now indicates both syntactic and semantic encoA'ng, not just raw "format") 
and to make more conspicuous the change in data type. 

The MIME encoding identifier string will still be provided for, in the new 'content_type' 
property defined in the Encoding module. The allowed values for this property are the same 
as the former format property, namely any valid MIME identifier. In a sense, we have really 
renamed the format proper^ 'contentjtype' and moved it out of the Identity module, and 
created a new property 'encoding' to indicate a finer resolution of syntactic and semantic 
encoding. 

It is antidpated that in the XML Schema(s) for MARS, there will be defined an Encoding 
element class, whidi will provide required attributes for defining the MIME, schema, . 
resolution, filename suffix, and other fixed properties of particular symboHc encoding token 
values. These can then be referenced by any system to automatically propagate them to tiieir 
relevant MARS property values as needed, and for vaHdation purposes (i.e, tests that ensure 
that e.g. the MARS content-type property can't be set to "text/sgml" for an instance with 
encoding •neutraLmu', etc.)* 

> Default language value is now 'none* 

3^ Since graphics witii no text have no actual "language", the default language value must be 

'none*. 

This places the burden on tools such as NED to get user or environment defined defaults for 
language, and may also have implications for query tools to "match" retrieval queiies witii a 
defaulted specified language to instances with language values equal to 'none'. I.e., in some 
^ retrieval a^iplications, afl other language property values may be seen as being equivalent to 

"none', all other criteria vdthstanding. 

> Media Resource Name (MARS Identity) DEN syntax defined 

40 This will be the required format for all cases where a single string identifier is needed for 

any gjven physical storage item. See section 8. 

> Percentage data type defined 

See section 5.2.5. 

> Ranking data type defined, and token rank value now of type Ranking 

See section 5.2.9, 
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> Removed source mapping and references to legacy metadata vocabularies 

The source and synonym mapping table and references to Nokia internal legacy metadata 
5 vocabularies were important when grounding the initial version of MARS to prior or 

existing systems and environments; however, further reference to them in later versions of 
the MARS specification is not necessary for understanding or application of later versions 
of the standard; and maintaining the mappings of any new MARS properties to all prior or 
existing vocabularies is a burdensome task whidi can fairiy be seen as outside ttie scope of 
the specification itself. Version 1.0 of the MARS specification will be maintained and can 
be referenced when there are questions regarding the historical mappings from which that 
original version was derived. There may also be other documents maintained which define 
and track the synonymous intersections of various vocabularies in use within Nokia. 

15 > Added Encoding module and properties 

Based on work done primarily by the Graphics SIG. a set of new properties for specifying 
graphics and odier data encoding qualities was defined, and the results of that work have 
been incoqKirated into the MARS specification. 

20 

> Version renamed to Revision and changed firom String to Count 

In order to define an explicit, uniform revision idendfication scheme, incremental editorial 
revisions are numbered by simple sequendal integers. The proper^ was named 'revision' so 
that Version' could be used elsewhere as a process or system specific value^ possibly the 
combination of release and revision values, separated by a decimal, to represent major and 
minor branches; or as some other value as needed. Witlun the Metia Framework, and 
particularly within a OMA. only the revision value is authoritative and reliable. Any other 
specified properties such as a process specific or other custom Version' identifier are only 
^ informational, and should not be the basis for generic Metia Eramework tools or processes. 

> Added the data types Content Type, Character Set, and Encoding 

See section 6.6. 

35 > Removed sections discussing specific serialization and encoding methods 

Serialization methods such as XML, XML DTD*s, XML Schemas, RDF, RDF Schemas, 
etc. are more properly addressed in the Framework and GMA specifications and are not 
within the scope of MARS, which is only a vocabulary and vocabulary specification 
40 framework. 

> Multivalue separator changed from semicolon to white space 

► The method for differentiating between multiple values encoded in a single string has been 
45 changed fix)m semicolon to white space (spaces for non-string value types, line break for 

string values). This follows common Internet and WWW practice and provides for a more 
consistent user interface in web browser based applications (which is the case for all current 
Metia Framework s^lications). 

so > Added relevance property 

See section 6.3.5. 
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> Order of multiple values is now preserved 

The order of multiple values is significant for compound actions, i.e. a sequence of actions 
^ to be performed on tiie same data in succession, e.g. 'generate store', 'lock retrieve', locate 

remove', etc. Therefore it is now manditory that the order of muldple values be preserved 
by an agents operating on MARS metadata. 

> Action property count changed to Multiple 

Certain agent operations are greatly simplified if multiple, sequential actions can be 
specified for the same data; therefore, the action property now may have muUtple values. 

> Changed 'keywords' property to type String 

IS This is necessary to support a broad range of registry services as well as to allow the 

definition of terms and compound names as keywords. 

It is expected that the 'class' property be used to define classifications based on one or more 
controlled vocabularies of class labels and that the "keywords' property be used for a variety 
of purposes, including ad-hoc dassification labels assigned by content producers and/or 
^ managers, index term sets for various registry services, and for input to various queries. 

> Added 'includes' property to Association module 

The 'includes' jroperty is used to define separately managed instances which are inchided 
25 iniii^ as the content of another instance. It is also utilized by DEP-REGS (the Dependency 

Relation Registry Service) for profiles and queries relating to reusable components and their 
occurrence within higher level instances. See section 6.7.6. 
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This document defines the Portable Media Archive (PMA}» a physical organization model 
of a file system based data repository conforming to and smtable for implementadons of the 
Generalized Media Archive (GMA) abstract archival model. 

The PMA model is a component of the Meda Framework for Electronic Media. A basic 
understanding of the Meda Framework, die GMA, and MARS is presumed by diis 
S|)edfication. 



10 



2 Oyerview 



The PMA defines an explidt yet highly portable file system organization for the storage and 
retrieval of information based on Media Attribution and Reference Semantics (MARS) 
IS metadata. The PMA uses the MARS Identity and Item Qualifier metadata property values 

Aemselves as directory and/or file names, avoiding the need for a secondary referencing 
mechanism and thereby simpli^ng the implementadon, maximizing efficiency, and 
producing a mnemonic organizational structure. 

This specification only defines the physical organization of a file system, and not the 
processes or algoritims for accessing, manipulating, or otherwise interacting witii or 
operating on that file system. Different GMA implementations based on the PMA model 
may interact with the data in different ways. 

Any GMA may use a physical organization model other than the PMA. The PMA physical 
ardnval model is not a requirement of the GMA abstract archival model. However, the 
PMA may nevertheless be employed by such inxpl^entadons both as a data interchange 
format between disparate GMA implementations as well as a format for storing portable 
backups of a given archive. 
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3 Rdated Documents, Standards, and Spedfications 

* . 3*i Mctia Framework for Electronic Media 

The Meda Bramewoxk is a generalized metadata driven firaznewock for the management and 
distribution of electronic media which defines a set of standard, open and portable models, 
interfaces, and protocols facilitating the construction of tools and environments optimized 
10 for the management, referencing, distribution, storage, and retrieval of electronic media.; as 

well as a set of core software components (agents) providing functions and services relating 
to archival, versioning, access control, search, retrieval, conversion, navigation, and 
metadata management. 

http'7/metia.nokia.com/specifications/#Metia 

3.2 Media Attribution and Reference Seniantics (MARS) 

Media Attribution and Sefexence Semantics (MARS), a component of the Metia 
Framework, is a metadata, specification framework and core standard vocabulary and 
^ semantics facilitating the portable management, referencing, distribution, storage and 

retrieval of electronic media. 

frttp://qietia.flolgft.q?ICq/gpecifip^tio^ 

25 3.3 Generalized Media Archive (GMA) 

The Generalized Media Archive (GMA), a component of the Metia Framework, is an 
abstract archival model for the storage and management of data based solely on Media 
Attribution and Reference Semantics (MARS) metadata; providing a uniform, consistent, 
^ and implementation independent model for information storage and retrieval, versioning, 

and access control. 

htt p://metia.nolda.eom/specifications/#GMA 
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4 General Architecture 

The physical structure of a PMA is organized as a hierarchical directory tree that follows 
5 the MARS object/instance/component/item scoping model. 

£ach media object comprises a branch in the directory tree, each media instance a sub- 
branch within the object branch, each media component a sub-bmnch within the instance, 
and so forth. 

10 Only MARS Identity and Item Qualifier property values are used 

All other metadata properties (as well as Identity and Qualifier properties) are defined and 
stored persistently in 'meta* storage items; conforming to the serialization and interchange 
encodings defined by the OMA spectOicadon. 

^5 Because Identity and Item Qualij5er properties must either be valid MARS tokens or integer 

values, any such property value is an acceptable directory or file name in all major file 
systems in use today, 

4,1 Media Object Scope 

20 

The media object scope is encoded as a directory path consisting of a seqtience of nested 
directories, one for each character in the media object Identifier' property value. B.g.: 
identifier--<an9 982827172 • d/a/g/9/e/2/8/2/7/l/2/ 

Identifier values are broken up in this fashion in order to support very large numbers of 
25 media objects, possibly millions or billions, residing in a given archive. If tiie identifiers 

were used as complete directory names, most file systems would si^port only several 
himdred to several thousand media objects, depending on the file system. 

Using only one character per cHrectory ensures that there will be at most 37 child sub- 
directories within any given directory level (one possible sub-directory for each character in 
^ the set (a-zO-9 J allowed in MARS token values), further satisf}dng the maximum directory 

children constrsdnts of most modem file systems (see below). 

The media object scope may contain either media instance sub-scopes or media component 
sub-scopes; the latter defining informadon (metadata or otherwise) which is shared by or 
^ relevant to aH instances of the media object 



4.2 Media Instance Scope 

The media instance scope is encoded as a nested directory sub-path within the media object 
40 scope and consisting of one directory for each of ttie property values for "release', language*, 

'coverage*, and •encoding', in that order E.g.: 

releases"!* language a "en" coverage* "global" encodings "xhtml" 
=» 1/en/global/xhtml/ 



45 



4.3 Media Component Scope 

The media component scope is encoded as a sub-directory within either the media object 
scope or media instance scope and named the same as the component property value. E.g.: 



componentis**cneta" meta/ 
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4.4 ReTision Scope 

The xevision scope, grouping the storage items for a pardcular revision milestone, is 
5 encoded as a directory sub-path within the media comp6ncnt scope beginning with ttie 

literal directory 'revision* followed by a sequence of nested directories corresponding to the 
digits in the non-zero padded revision property value. E.g.: 

revision="27" revisiem/2/7/ 

The 'data' item for a given revision must be a complete and whole snapshot of the revision, 
not a partial copy or set of deltas to be applied to some other revision or item. It must be 
fully independent of any other storage item insofar as its completeness is concerned. 

4.5 Fragment Scope 

15 . 

The fragment scope, grouping the storage items for a particular static fragment of the data 
component content, is encoded as a directory snb-padi within the media component scope or 
revision scope and beginning with the literal directory •fragment' followed by a sequence of 
nested directories corresponding to the digits in the non-zero padded fragment property 
20 valtfe. E.g.: 

f ragmento«5a41" => f ragment/S/O/*/!/ 



4.6 Event Scope 

25 event scope, grouping action triggered operations for a particular component, instance, 

or object, is encoded as a directory sub-path within the media component scope, media 
instance scope, or media object scope and beginning with the literal directcwy 'events* and 
containing one or more files named the same as the MARS action property values, each file 
containing a valid MARS XML instance defining the sequence of operations as ordered 

^ ' property sets. B.g.: 



event a / store 
events /retrieve 
event s /unlock 



4,7 Storage Item 

The storage item is encoded as a filename within the media component, revision, or 
fragment scope and named the same as the item property value. E.g.: 

item«»data" data 

45 
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5 Host File System Requirements 

This specification does not set minimum requirements on the capacities of host file systems, 
nor absolute limits on the volume or depth of conforming archives. However, an 
understanding of the variables which may affect portability from one file system to anodier 
is important if data integrity is to be maintained. 

This specification does, however, define tiie following recommended minimal constrants 
on a host file system, which should be met, regardless of the total capacity or other 
capabilities of the file system in question: 

Hie and Directory Name Length: 30 
Directory D^th: 64 
Number of Directory Children: 100 

The above specified constraints are cooipatible with the following commonly used file 
systems, which are therefore suitable for hosting an PMA (which also does not exceed real 
constraints of the ^ven host file system): 

20 VFAT (Windows 95/98). NTFS (Windows NT/2000), HPS (Macintosh), HPFS 

(OS/2), HP/UX. UPS (Solaris), ext2 (Umix). ISO 9660 LeveU 2 and 3 
(CDROM), andUDF (CDRW, DVD). 

There are likely many odier file systems in addition to tiiose listed above which are suitable 
for hosting an PMA. 

Note that FAT (MS-DOS, Windows 3,x) and ISO 9660 Level 1 file systems are not stdtable 
for hosting an PMA ISO 9660 l^vel 1 plus JoUet or Rock Rid^ extenaow'may be 
suitable m some cases, but this is not generally recommended. 
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6 Example Archive File System 

The following is a fragment of ao example file system organization for a Portable Media 
Archive. The location of the directory paths with respect to the root directory is not 
specified. The directory separator is iUiistrative only, and will conform to each particular 
file system in which a given archive is stored. 

Media object scope path segments are highlighted in blue, media iostance scope segments in 
10 re<Ii media coxiqK>netit scope segments in green, revision scope segments in violet, jiagment 

scope segments in orange, event scope segments in crimson, and storage items in black. 

d/n/9/9/8/2/a/2/7/i/2/meta/data 
d/n/ 9/ 9/8 /2/8/2 /7/ 1/2/meta/meta 
15 d/n/9/9/8/a/8/2/7/l/2/meta/revialon/l/data 
V d/n/9/9/8/2/8/2/7/1/2/ineta/revision/l/meta 
dM/9/9/e/2/8/2/7/l/a/meta/rcvl3ion/2/data 
d/n/9/9/8/2/a/2/7/l/2/meta/revtsioii/2/ineta 
d/n/9/9/8/2/8/2/7/l/2/raeta/revi8lon/3/data 
20 d/n/9/9/8/a/8/2/7/l/2/meta/revision/3/ineta 
d/n/9/9/8/2/a/2/7/l/2/cneta/revlsion/4/data 
d/n/9/9/8/2/8/2/7/l/2/met;a/reviBion/4/meta 
d/n/g/9/8/2/a/2/7/l/2/meta/revision/S/data 
d/n/9/g/a/2/8/2/7/l/2/Bieta/revi8ioa/5/»eta 
23 d/n/9/9/8/2/8/2/7/l/2/tneta/event8/geaerate 

d/n/9/9/8/2/8/2/7/l/2/l/en/global/doGbook/aiet:a/data 
d/n/9/9/8/2/8/2/7/l/2/l/en/global/docbook/meta/met:a 
d/n/9/9/8/2/8/2/7/l/2/l/en/global/docbook/m8ta/reviaicm/l/data 
d/n/9/9/8/2/8/2/7/i/2/l/cn/global/doQbook/niefca/revi6ton/l/meta 
^ d/n/9/9/8/2/a/2/7/l/2/l/en/global/docbook/nieCa/revieioa/2/data 
d/n/9/9/8/2/8/2/7/l/2/l/en/global/docbook/TOata/reviaian/2/meta 
d/n/9/ 9 / 8 /2/8 /2 /7 /1/2 /l/en/global/docbook/ toc/data 
d/n/9/9/8/2/8/2/7/1/2/l/en/glbbal/docbook/toc/neta 
d/n/9/9/8/2/8/2/7/l/2/i/en/global/docbook/iiidex/dat:a 
35 d/n/9/9/8/2/8/2/7/1/2/l/en/glbbal/docbook/index/nieta 

d/n/9/9/8/2/8/2/7/1/2/l/en/global/docbook/glofiaary/data 
d/n/9/9/B/2/8/2/7/l/2/l/en/global/docbook/gloa9ary/iiieta 
d/n/9/9/8/2/8/2/7/1/2/l./en/global/docbook/data/data 
d/n/9/9 /8 /2 /8/2 /7 /1/2/1/en/global/docbook/dat a/ineta 
40 d/n/9/9/8/2/8/2/7/1/2/l/en/global/docbook/data/revieion/l/data 
d/n/9/9/8/2/e/2/7/l/2/i/en/gl6bal/docbook/data/revision/l/meta 
d/n/9/9/8/2/8/2/7/l/2/l/en/global/docbook/data/revisicai/2/dat:a 
d/n/9/9/8/2/8/2/7/l/2/l/ea/global/docbook/data/revision/2/meta 
d/ur9/9/8/2/8/2/7/l/2/l/en/global/docbook/data/r2Vi9ion/3/data 
d/n/9/9/8/2/8/2/7/l/2/l/en/global/docbook/data/revision/3/meta 
d/n/9/9/8/2/8/2/7/l/2/l/en/gXobal/docbook/d8ta/revislon/4/data 
d/n/9/9/8/2/8/2/7/l/2/l/en/global/docbook/data/rsvisioa/4/meca 
d/n/9/9/8/2/8/2/7/l/2/l/en/global/docbook/data/revisian/ . . . 
d/n/9/9/8/2/8/2/7/i/2/l/en/global/dpcbook/data/revi8icn/2/l/data 
^ d/n/9/9/ 8/2/8/ 2/7/i/2/i/en/global/docbook/data/revision/2/i/meta 

d/n/9/9/a/2/8/2/7/l/2/i/en/globaX/docbook/dst:a/evcnts/store 
d/n/9/9/e/2/8/2/7/l/2/X/en/global/docbook/data/fivents/renK>ve 
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d/n/9/9/8/2/e/2/7/i/2/l/en/global/xhtml/ineta/data 

d/n/9/9/8/2/8/2/7/X/2/l/en/global/xhtml/ineta/meta 

d/n/9/9/8/2/8/2/7/l/2/l/en/gl6bal/3<ht:ml/n«ta/rGvi8ion/l/data 

d/n/9/9/8/2/8/2/7/l/2/l/enL/global/xhtml/meta/revi«ion/l/Bieta 

d/n/9/9/8/2/8/2/7/l/2/i/en/global/xhtoa/data/revi8lon/. . . 

d/n/9/9/8/2/8/2/7/l/2/l/en/global/xhtml/meta/reviBtoii/9/data 

d/xi/9/9/e/2/8/2/7/l/2/x/en/global/xhtml/meta/revifllcai/9/ineta 

d/n/9/9/8/2/8/2/7/1/2/i/en/global/xhtml/toc/data 

d/n/9/9/8/2/8/2/7/1/2/l/eii/global/xhtml/toc/cneta 

d/n/9/9/8/2/8/2/7/1/2/i/en/globai/xhtna/index/data 

d/n/9/9/8/2/8/2/7/1/2/l/en/global/xhtml/index/meta 

d/n/9/9/8/2/8/2/7/i/2/i/en/global/xht:ml/glossaxy/data 

d/n/9/9/8/2/e/2/7/i/2/l/en/global/3chtml/glo8aary/meta 

d/n/9/9/8/2/8/2/7/x/2/i/en/gXobaX/xhtml/data/data 

d/n/9/9/8/2/8/2/7/x/2/X/en/global/xhtmX/data/meta 

d/n/9/9/8/2/8/2/7/x/2/l/Gn/global/xhtinX/data/idinap 

d/n/9/9/8/2/8/2/7/x/2/X/en/globaX/xhtmX/data/fragment/0/data 

d/n/9/9/8/2/8/2/7/x/2/X/en/global/xhtmX/data/fra3ment/0/meta 

d/n/9/9/8/2/8/2/7/x/2/x/exi/gXobal/3ditmX/data/fragm«nt/l/data * 

d/n/9/9/a/2/8/2/7/x/2/X/en/gXobaX/xhtmX/data/fraginent/X/Bieta 

d/n/9/9/8/2/8/2/7/x/2/X/eii/gXobal/xhtml/data/fragmerit/2/data 

d/n/9/9/8/2/a/2/7/x/2/X/en/globaX/xhtml/data/fraginent/2/meta 

d/n/9/9/8/2/a/2/7/l/2/x/en/globaX/xhtml/data/fraginent/3/data 

d/n/9/9/8/2/8/2/7/x/2/X/en/globaX/xhtmX/data/fragwent/. . . 

d/n/9/9/8/2/8/2/7/x/2/X/en/globaX/3chtmX/data/fragment/9/data 

d/n/9/9/8/2/8/2/7/x/2/X/en/gXobaX/xhtmX/data/£ragaient/9/mGta 

d/n/g/9/a/2/8/2/7/x/2/X/6n/gXobal/3ditmX/data/fragHient/x/0/data 

d/ii/9/9/a/2/a/2/7/X/2/X/eii/globaX/xhtml/data/fragmeiic/x/0/meta 

d/n/9/9/8/2/8/2/7/X/2/X/eii/gXobaX/xhtml/d«ita/f ragment/ ... 

d/n/9/9/8/2/8/2/7/i/2/X/en/globaX/xhtraX/data/fragoient/S/9/data 

d/n/9/9/8/2/8/2/7/X/2/X/en/global/xhtml/data/fi-aawenc/5/9/oieta 

d/n/9/9/8/2/8/2/7/x/2/X/en/globaX/xhtcra/data/fragmeiit/. . . 

d/n/9/9/8/2/8/2/7/X/2/X/en/gXobaX/xhtmX/data/fregmfint/5/9/3/2/data 

d/n/9/9/8/2/8/2/7/X/2/X/en/gXobaX/xhtniX/data/fraginent/5/9/3/2/meta 

d/n/9/9/8/2/8/2/7/X/2/X/en/globaX/xhtml/data/revision/0/data ' 

d/n/9/9/8/2/8/2/7/X/2/X/eii/globaX/xhtml/data/ravision/0/ineta 

d/n/9/9/B/2/8/2/7/x/2/x/en/global/xhtira/data/revlsion/0/, . , 

d/n/9/9/8/2/8/-2/7/X/2/X/en/gXobaI/xhtml/data/revi8ion/. . . 

d/n/9/9/B/2/8/2/7/X/2/X/en/gXobaX/xhtinX/data/revisioa/3/4/data 

d/n/9/9/8/2/8/2/7/x/2/l/en/globaX/xhtml/data/rfivl6ion/3/4/ineta 

d/n/9/9/8/2/8/2/7/X/2/X/en/'global/xhtml/data/r6viBion/3/4/idmap 

d/n/9/9/8/2/8/2/7/x/2/l/en/gXobal/xhtial/data/revi8ion/3/4/£ra3m«n!:/G/data 

d/n/9/9/8/2/a/2/7/x/2/X/en/3lobal/xhtffll/data/r6visicn/3/4/fre.9Tn«nt/0/meta 

d/n/9/9/a/2/a/2/7/X/2/x/en/gXobaX/xhtmX/data/rsvisiori/3/4/rra'3nien.t/ 

d/n/9/9/8/2/8/2/7/i/2/l/en/glob£l/xhtml/data/rfivision/3/4/fra^r.i&nc^ 

d/n/9/9/8/2/8/2/7/l/2/l/en/gXobaX/xhtinX/data/r6visicn/3/4 

d/n/2/4/8/2/0/5/3/mei:a/data 

d/n/2/4/8/2/0/5/3/roeta/cneta 

d/n/2/4/8/2/0/S/3/m6ta/revision/. . , 

d/ii/2/4/8/2/0/5/3/8/en/globaX/cg!!i_4/taeta/data 
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d/n/2/4/a/2/0/5/3/8/en/global/cgm_4/met:a/meta 
d/n/2/4/8/2/0/S/3/8/en/global/cgm_4/meta/revieioa/. . . 
d/n/2 /4/a/2/0/5/3/a/en/glbbal/cgro_4/index/data 
d/n/2/4/8/2/0/S/3/a/en/global/cgm_4/iiidex/meta 
d/n/2/4/8/2/0/S/3/a/en/global/cgra_4/data/data 
d/n/2/4/8/2/0/S/3/8/en/gl6bal/cgm_4/data/meta 
d/n/2/4/a/2/0/5/3/a/eu/global/cgm_4/data/revi8ion/l/data 
fo d/n/2 /4/8/2/0/S/3/e/en/global/cgm_4/data/reviaion/i/roeta 

d/n/2 /4/8/2/0/5/3/8/en/globaX/cgm_4/data/revi9ion/. . . 
d/n/2/4/8/2/0/5/3/e/en/global/cgm_4/data/revi8ion/l/7/data 
d/n/2/4/8/2/0/5/3/8/en/gl6bal/cgm_4/data/reviaion/i/7/nieta 
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5.8 generate ^ 24 

6 Serialization and Encoding of Specialized Storage Items ^ 26 
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6.3 'idmap* Storage Items .....26 

20 6.4 'data' Storage Items for a specific Revision ^ « 27 



1 Scope 

25 

This document defines the Generalized Media Archive (GMA), an abstract archival model 
based solely on Media Attribution and Reference Semantics (MARS) metadata; providing a 
uniform, consistent, and implementation independent model for the storage, letrieva], 
versionJng, and access control of electronic media. 

The GMA model is a component of &e Metia Framework for Electronic Media. A basic 
understanding of the Metia Framework and MARS is pxesumed by tUs specification. 

2 Overview 

^ The GMA is a central component of the Metia Franoewoik and serves as die common 

archival model for all managed media objects controlled, accessed, transferred or otherwise 
manipulated by Metia framework agencies. 

The GMA provides a uniform, generic, and abstract organizational 'model and functional 
interface to a potentially wide range of actual archive implementations; independent of 
operating system, file system, repository organization, versioning mechanisms, or other 
implementation details. This abstraction facilitates the creation of tools, processes, and 
methodologies based on this generic model and interface which are insulated from the 
internals of the GMA compliant repositories with which they interact 

45 The GMA defines specific behavior for basic storage and retrieval, access control based on 

user identity, versioning, automated gen^ation of variant instances, and event jprocessing. 

The identity of individual storage items is based on MARS metadata semantics and all 
interaction between a client and a GMA implementation must be expressed as MARS 
metadata property sets. 
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3 Related Docmnents, Standards, and Specifications 

5 3.1 Metia Framework for Electronic Media 

The Metia Framework is a generalized metadata driven framework for the management and 
distribution of electronic media which defines a set of standard, open and portable models, 
interfaces, and protocols facilitating the construction of tools and enviromnents optimized 
for flie management, referencing, distribution, storage, and rctiievd of electronic media-; as 
10 as a set of core software components (agents) providing functions and services relating 

to ardiival, versioning, access control, search, retrieval, conversiori, na>dgation, and 
metadata management 
http://metia.nol da.com/specificadons/ffiMetia 

15 - 

3.2 Media Attribution and Reference Semantics (MAHS) 

Media Attribution and Reference Semantics (MABS), a component of the Metia 
Framework, is a metadata specification framework and core standard vocabulary and 
semantics facilitating the portable management, referencing, distribution, storage and 
^ retrieval of electronic media. 

http://metia,nokia.com/8ne cificariona/#MARS 

3-3 Portable Media Archive (PMA) 

The Portable Media Archive (PMA), a component of the Metia Framework, is a physical 
organization modd of a file system based data repository conforming to and suitable for 
implementations of the Generalized Media Archive (GMA) abstract archival model. 

http://metia.nQTQa,c6m/sDecifications/#PMA 

30 

3 A Registry Service Architecture (REGS) 

The Registry Service Architecture (KEGS), a component of the Metia Framework, is a 
generic architecture for dynamic query resolution agencies based on the Metia Framework 
35 and Media Attribution and Reference Semantics (MARS), providing a unified interface 

model for a broad range of search and retrieval tools. 

http!//metia.nokia.CQm/sp ecifications/#REGS 
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4 General Architecture 

A GMA manages media components and contains stcx-age items. 

5 

The operation of a GMA can be divided into ,ttjc following five functional units: 




Storage and Retrieval of items is simply the act of associating dectroidc media data 
streams to MARS storage item idendties and making persistent, retrievable copies of tbose 
data streams indexed by thdr MAHS identity (either direcdy or indirecdy), as well as the 
management of creation and modification time stands. 

Access Control is based on several controlling criteria as defined for die environment in 
30 whicb the GMA resides and as stoted in the metadata of individual components managed by 

the GMA. Access control is defined for entire components and never for individual items 
within a con^nent Access control can also be de&ed for media objects and media 
instances, in which case subordinate media components tnheiit the access configuration 
from the higher scope(s) in the case that it is not defined specnfically for the component 
Access control also includes the management of user identity and role mtetadata such as 
^ creator, owner, contributor, etc, 

Yersioning is performed only for 'data* items of a media component and constitutes the 
revision history of the data content of the media coo^nent It also includes general 
management and updating of creation, modification and other time stainps. Storage or 
update of items other than the 'data' item neither effect the status of management nietadata 
stored in the 'meta' item of the component (unless the item in question is in fact the Hneta' 
item of the component) nor are reflected in the revision history of the component If a 
revision history or particular metadata must be maintained for any MARS identifiable body 
of content, then that content must be identified and managed as a separate media 
45 component, possibly belonging to a separate media instance. 

Generation is the process of automatically producing an item eitfier from another item or 
from metadata, or both in response to a generation or rctrievai request from some client 
(possibly recursively from the GMA itself)« The automatically produced item is typically 
derived from the 'data' item of a component as a variant encoding, a report of some form, a 
50 fragment or subset of the original content, or some other derivative of the original data item. 
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Events concern the handling of events which may trigger other operations automatically in 
conjunction with the client specified operations; typically the regeneration of items, 
components or instances derived from content data and/or metadata when the content from 

* which they are derived changes. 

Every GMA must implement die storage and retrieval functional unit in some fashion (it 
need not be an explicit implementation unit), but may optionally omit any of the other 
functional units, or allow for them to be disabled^ depending on the needs of the given 

,0 application and/or environment It is not permitted, howevw, for a GMA to only partially 

implement a functional unit; or rather, a GMA cannot claim to include a functional unit 
unless the behavior of the functional unit as defined in this specification is fully 
implemented. 

. 4.1 Management -BY- Metadata 

A GMA relies on specific MARS metadata (and only that metadata) in order to operate, and 
also defines or updates MARS metadata as part of its operation. Management and 
manipulation of electronic media solely via metadata is a fundamental goal of the Metia 
20 Framework and thus also of the GMA. 

4.1.1 Content versus Management Metadata 

It is important to make a clear distinction between content metadata and management 
metadata. Content metfidata describes the qualities and characteristics of the infonnation 
content as a whole, indqjendent of how it is managed. Management metadata, on the other 
hand, is specifically concerned with the history of the physical data, such as who may 
retrieve or modify it, when it was created, whether a user is currendy making modifications 
bo it, what the current revision identifier is, etc. 
30 Content metadata is outside the scope of concern of a GMA, and typically is stored as a 

separate 'meta* component, not a *meta* item, such that the actual specification of the content 
metadata is managed by die GMA just as any other media component. The metadata that is 
of primary concern to a GMA. and which a GMA accesses, updates, and stores persistendy, 
is the metadata associated with each component. 
^ A GMA manages media components, and the management metadata for each media 

component is stored persistently in the 'meta' storage item of the media component. 
A spedal case exists with regards to management metadata which might be defined at the 
media instance or media object scope, where that metadata is inherited by all sub- 
40 components of the higher scope(s). See section 4.2.2 for details. 

4.1^ MARS Properties Required by GMA 

- The following MARS metadata properties are required by a GMA to be defined in the input 
query and/or for the target data, depending on the action bdng performed and which 
^ functional units are implemented. See the pseudocode in section 5 for usage details. 

The functional units are represented in the table as follows: Storage & Retrieval =s 'SR\ 
Versioning = "V, Access Control = 'A', Generation = *G', and Events = 'E*. 
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Property 






5 


identifier, lelease, language, 
coverage, encoding, component, 
item 


CP V A n 1? 
oK, V, A, 0| A 


miflli^ ffitrievfi stnuB- remove 

generate 




identifier, release, language, 
coverage, encooing, coniponeni 


SR,A,E 


lode, unlock 


10 


user, access 


A 


qualify, retrieve, store, remove, 
lock 




user 


A 


unlock 




revision 


V 


qualify, retrieve, store 


15 


fragment 


SR 


qualify, retrieve, store 




pointer 


SR 


retrieve 




comment 


Y 


store 


20 


size, pointer 


G 


generate, retrieve 



4.1,3 MARS Properties Used by GMA 
25 The f ollov^ing MARS metadata properties are generated, updated, or otherwise modified by 

a GMA for one or more actions^ depending on wluch functional units are io^lemnted. See 
the pseudocode in section 5 for usage details. 



Properly 


Functional Unit 


Action 


created, modified, size 


SR 


store 


owner, creator, modifier, 
contributor 


A 


store 


user 


V 


lock 


locked 


SR 


lock, unlock 


revision 


V 


store 


fragment 


G 


generate 



40 



4.L4 Default Property Values 

A GMA may assume the default values as defined by the MARS specification for all 
properties which it requires but are not specified explicitly. It is an error for a required 
property to have neither a default MARS value nor an explicitly specified value. 
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4.2 Management -OF- Metadata 

In addition to relying on already defined metadata, a GMA is itself responsible for defining. 
5 updating, and maintaining the management metadata relevant for the 'data' item of each 

media component, which is stored persistently as the *meta' item of the component In fact, 
most of the metadata produced by a GMA is later used by the GMA for subsequent 
operations. 

4.2.1 Persistent Storage 

A GMA is free to store "meta* items, containing management metadata, in any internal 
format; however every GMA must accept and return 'meta' storage items as XML instances 
as defined in section 6 of this specification. 

13 . Content metadata, however, constituting the data content of a 'meta* component and stored 

as the *data* Item of die 'Imeta' component, must always be a valid XML instance as defined 
by this specification. 

These two constraints ensure that any software agent is able to retrieve from or store to a 
GMA both content and management metadata as needed, as well as any GMA may resolve 
^ ir^erited management metadau from meta components at higher scopes in a generic 

fashion. 

4.2.2 Inheritance and Scope 
25 The MARS specification defines a set of simple rules for metadata properQr inheritance. In 

short, properties defined at a given scope are visible at all lower scopes, and the definition 
of a property at a lower scope takes precedence over any definition at a higher scope. 

Management metadata may be defined at the media object or media instance scope, 
applying to (being inherited by) all sub-component scopes. 

It is the responsibility of the GMA to both retrieve and utilize all inherited metadata 
p roperties of a component, as well as to differentiate inherited from component specific 
properties when storing persistent metadata property sets, such that only component specific 
properties are stored. This ensures that changes to inherited properties take effect on all 
subsequent operations in the component scope. A GMA is fi»e to "mixror" inherited 
properties at the component scope so long as absolute synchronization is maintained 
between the mirrored properties and their inherited source. 

A GMA may never include inherited properties in any 'meta* storage item output as the 
result of a retrieve action. 
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43 Storage and Retrieyal 



- Storage and Retrieval of items is simply the act of associating electronic media data streams 
to MARS storage item identities and making persistent, retrievable copies of those data 
streams indexed by thdr MARS identity (eidier direcUy or indirectly), as well as the 
' management of creation and modification time stamps. 

Every GMA must implement die core storage »and retrieval functional unit. If versioning. 
access control, generation, and/or event units are also implemented, then the storage and 
50 retrieval operations may be augmented in one or more ways. 
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A GMA is free to use any means to organize both the repository of storage items as wdl as 
the mapping mechanisms relating MARS identity metadata to locations within that 
repository. GMA implementadons might, employ common relational or object oriented 
database technology, direct file system storage, or any number of custom and/or proprietary 
technologies. Regardless of the underlying implementation, a GMA must accept input and 
provide output in accordance vdth this specification. 

''^ 4A Access Control ^ 

A GMA implementation is not required to implement access control, but if access control is 

provided, it must conform to the behavior defined in this specification. 

Access Control of media components is based on several controlling criteria as defined for 

is the environment in which the GMA resides and as stored in the metadata of individual 

components managed by the GMA, Access control is defined for entire components and 
never for individual items within a component Access control can also be defined for 
media objects and media instances, in which case subordinate media components inherit the 
access configuration from die higher scope(s) in the case that it is not defined specifically 

20 for the component 

The foiir controlling criteria for media access are: 

1. User identity 

2. Group membership(s) of user 

3. Read pennisslon for user or group 

4. Write permisfflon for user or group 

4,4.1 User Identity 

Every user must have a unique identifier within the environment in which the GMA 
operates, and the permissions must be defined according to the set of all users (and groups) 
within &at environment. 

A user can be a human, but also can be a software application, process, or system. This is 
35 especially important for both licensing as well as tracldng operations performed on data by 

automated software agents operating within the GMA environment. 

4.4.2 Group MembersMp 

40 Any user may belong to one or more groups, and permissions can be defined for an entire 

group, and thus for every member of that group. This greatly simplifies the maintenance 
overhead in environments with large numbers of users and/or high user turnover (many 
users coming and going). 

Permissions defined for an explicit user override pennissions defined for a group of which 
^gj. .5 g member. Thus, if a group is allowed write permission to a component, but a 
particular user is explicitly denied write permission for that component, then the user may 
nor modify the component 

50 4.4.3 Read Permission 

Read permission means that the user or group may retrieve a copy of the data. 
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The presence of a lock marker does not prohibit retrieval of data, only modification. 

If access control is not implemented, and/or unless otherwise specified globally for the 
GMA environment or for a particular archive, or expUcitly defined in the metadata for any 
xelevant soopt, a GMA must assume that all users have read permission to all content 



4.4.4 Write Permissioii 

Write permission means that the user or group may modify (store a new version of) the 
data- 
Write pmnission equates to read permission such that every user or group which has write 
permisdon to pardcular content also has read permission. This is true even if the user or 
group is explicitly denied read permission otherwise. 

The presence of a lock marker prohibits modification by any user other than the owner of 
the lock, including the owner of the component if the lock owner and component owner ate 
different. It is permitted for a GMA to provide a means to break a lock, but such an 
operation should not be available to common users and should provide a means of logging 
the event and ideally notifying the lock owner of the event 

If access control is not implemented, a GMA must assume that all users have write 
permis^on to aU content 

If access control is implemented, and unless otherwise specified globally for the GMA 
environment or for a particular archive, or explicitly defined in the metadata for any 
relevant scope, a GMA must assume that no users have write pemussion to any content. 

Regardless of any other metadata defined access specifications (not including settings 
defined gjlobally for the archive), the owner of a component always has write access to that 
component 

30 

4.4.5 Access Levels 

This specification defines a set of access levels which serve as convenience terms when 
defining, ^edfying, or discussing the "functional mode" of a particular GMA with regard 
to read and write access control. 

Access levels can be used as configuration values by GMA implementations to easily 
specify global access behavior for a given GMA where the implementation is capable of 
providing multiple access levels. 



Level 


Read 


Write 


1 






2 




X 


3 




A 


4 


A 


A 



* = no access control, public access 
X s access prohibUed globally 
A s access control by user identity 
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Note that because write permission subsumes, or includes read permission, it is not 
meaningful (albeit possible) to define an access level where theie is read access control but 
g no write access control. This is because giving global write permission to any user is the 

same as ^ving global read permission, as write permission overshadows or overrides read 
permission, and thus even if a particular user was denied read access for a given storage 
item, they would still have implicit write pennission, which includes read permission; 
making the denial of read access ineffective. 

10 A GMA implementadon is not required to provide a particular level of access control; 

however, it must be cleariy stated for each implementadon which level, if any, above level 
1 is availaible. Furthermore, if access control above level 2 is provided, it must conform to 
the behavior defined in this specificadon. 

4.5 Versioning 

A GMA implementation is not required to implement versioning, but if versioning is 
provided, it must conform to the beha^dor defined in tWs spedficadon. 

20 Versioning relates to the identification, preservation, and retrieval of particular revisions 

(editions) in the editorial lifecycle of some discrete body of data. A version is a snapshot in 
time, and retrieving a past version is traveling back in time to die point when that snapshot 
was taken. Sequences of sn^shots may be related by sharing a connimoa ancestry while 
differing in one or more recent revisions. 

25 Versioning is often modeled as a tree, where a sequences of slisqpshots is a path from the 

root of the tree, along the branches and sub-branciws, to the leaves. Sequences are related 
by tfadr shared portions in the tree, being the conunon trunk and branches which are part of 
both paths firom the root; up to the point where the two sequences differ in a given revision, 
or s^arate/split into two distinct branches. Each branch is given a sequential identifier 

30 (usually a positive integer), and each level of branches, sub-branches, sub-sub-branches, etc 

is separated by some distinct punctuation, typically a period. At any ^ven point of 
separation of two revision sequences (paths through the tree), the branch may dther divide 
equally, such that there become two sub-branches each of which receive a new numbering 
level, or the main branch may simply "grow** a sub-branch where the revision number 

35 sequence of the main branch continues onwards at the same levd while the sub- branch's 

revision number sequence gains an additional leyel. 

The primary (almost exclusive) motivation for having many distinct branches is the 
management and maintenance of concurrent yet variant instances of the data, which are 

^ accessible and used in some fashion in parallel. A good example of this is software, where 

one version is being used while the next version is being developed. Problems (bugs) arising 
in tiie currentiy used version may not exist in the later version under development, yet one 
must still make the necessary corrections to the current version. In such a case, the software 
_ code revision sequence "branches", with the development process of the newer version 

^ becoming a new sub-branch and the maintenance (bug-fix) process of the ciirrent version 

remaining the msxn branch. Both branches share a common beginning (path from the root) 
but have unique progressions thereafter. In some cases, two distinct branches (related or 
otherwise) might merge at some point, making the resultant data model a graph in actuality, 
but it is nevertheless still common to speak in terms of tree structures. 

While providing a very useful and effective means to organize and manage related editorial 
sequences as connected branches, the tree based versioning model has a number of 
shortcomings. It allows arbitrarily deep trees, allowing (and in some cases encouraging) the 
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20 



fragmentation of editorial sequences which are not meaningful nor productive in practice. It 
also allows for a plethora of incompatible interpretations applied to the various levels in the 
tree, making the interchange of lustorical information difficult, and in many cases 
impossible. 

The MARS versioning model, which is used by every GMA, addresses the same needs 
provided for in the tree based versioning model — namely (1) die need to make (and later 
retrieve) snapshots along a sequence of editorial revisions, (2) the need to manage separate 
parallel sequences of revisions, and C3) the need to relate sequences with shared history — 
but does so in a much simpler and (most importantly) portable fashion. 

Versioning is divided into two levels: (1) an individually managed and independently 
accessible editorial sequences are called a Release' and corresponds to a branch in die tree 
i^ased versioning model; and (2) snapshots along an editorial sequence (release) are called 
IS revisions and correspond to leaves in the tree based versioning model. 

Each release is g^ven a unique positive integer identifier. Likewise, each identified 
(managed) revidon within a release sequence is given a unique positive integer identifier, 
and the revision numbering sequence begins anew for each release. Releases which are 
derived from odier releases (i.e. sub*branches growing out &om parent branches) may 
specify via the MARS 'source* property the particular release and revision from which they 
come. These three pieces of information — release niimber, revision nimiber, and source (if 
any) — meet all three of the above defined versioning needs. 

A OMA which implements versioning is responsible only for the linear sequence of 
revisions wiihin a media component 

A OMA implementadon is not responsible for the automated or semi-automated creation or 
specificadon of new instances relating to distinct releases (branching) nor retrieval of 
revisions not unique to a pardcular release (paths in the tree up to the beginning of the 
particular branch) from its source(s) (ancestor branches); though it is free to offer that 
functionality if it so chooses. Typically, the creation of new releases (branching) will be 
performed manually by a human alitor, iodudiag the spedfication of Source* and any other 
relevant metadata values* Other tools, external to the GMA may also exist to aid users in 
performing such operations. 

^ Versioning is performed by a GMA only for the 'data* item of a media component and that 

sequence of revisions constitutes the editc^al history of the data content of the media 
component. The GMA is also responsible for general management and updating of creation, 
modification and other time stamp metadata. Storage or update of items' other than ttie *data* 
item neither effect the status of management metadata stored in the *meta' item of the 
component (unless the item in question is in fact the *meta' item of the component) nor are 
reflected in the revision history of the component If a revision history or particular 
metadata must be maintained for any MARS identifiable body of content, then that content 
must be identified and managed as a separate media component, pos^bly belonging to a 
separate media instance. 



40 
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4.5.1 Revision Numbering Scheme 

Revisions are identified by positive integer values (MARS Count values). The scope of each 
media component is unique and revision values have significance only within the scope of 
each particular media component. Revision sequences should begin witii the value *!* and 
proceed linearly without gaps. 
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The revision value zero *0' is reserved for special use by future versions of the GMA model. 
GMA implementations should neither permit nor generate revisions with a value of zero. 
Doing so may result in data and/or tools which are incompatible with future veisions of this 
s standard. 



4.5.2 Storage and Retrieval of Past Revisions 

A GMA implementation is firee to internally organize and stoic past revisions in any fashion 
10 ' it chooses. 

This specification describes two recommended mefiiods for storing past revisions of the 
content of a media component snapshotting and reverse deltas. In some cases, more than 
one method might be applied by a GMA, depent^ng on flie nature of the media in question. 

IS Regardless of its internal organization and operations, a GMA is required to return any 

requested revision which is maintained and stored by the GMA as a complete copy. 

4.S.2.1 Snapshotting 

20 Snapshotting is simply the process of preserving a complete copy of every revision. One 

takes a "snapshot" of the content at a given point in time and assigns a revision number to it 
Two clear benefits to snapshotting are that it is very easy to implement, and special 
(possibly time consuming) regeneration operations are not needed to retrieve past revisions. 
The latter can be veiy important in an environment where diere is heavy usage and retrieval 
times are a concern. 

A major drawback to snapshotting is that it places heavy storage demands on the system 
hosting the archive. It is also very inefficient in that the differences between revisions is 
typically very slight and therefore tiiere is a large amount of redundant information being 
Stored in the archive. 

It is permitted for a GMA implementation to limit the total number of past revisions that arc 
maintained (e.g. no more than 10) in cases where it is not practical or feaable to store every 
past revision since the creation of the media component; in which case there is the 
additional drawback that only a limited number of previous revisions are maintained and 
^ data loss (of the earliest revidons) is inevitable. 

4.5.2.2 Reverse Deltas 

A delta is set of one or more editorial operations (modifications) which can be applied to a 
40 body of data to consistently derive another body of data. A reverse delta is a delta which 

allows one to derive a previous revision from a former revision. • 

Rather than store the complete and total content of each revision, as is done with 
snapshotting, a GMA which uses reverse deltas simply stores the modifications necessary to 
derive each past revision from the immediately succeeding Oater) revision. A reverse delta 
then can be seen as a single step backwards in time, along the sequence of editorial 
milestones represented by each revision of data. To obtain a specific past revision, one musi 
simply begin at die current revision, and then apply the reverse deltas in order for eacb 
previous revision until the desired revision is reached. 

50 One could just as well have forward deltas, where the delta defines the operations needed tc 

derive the more recent revision from the preceding revision (and in fact the first revisioi 
management systems using deltas worked this way). The drawback to forward deltas, is tha 
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once a given editorial sequence becomes sufficiendy long, containing many revisions, it 
takes longer and longer to generate the most recent revision from the very first revision, 
appl^ng all of the deltas for all of the revisions over time. Typically, only the most current 
5 revidons are ever of interest, ttierefore it is much more efficient to rather work backwards 

in time to retrieve previous revisions from the most current 

The primary benefit to using reverse (or forward) deltas in a GMA implementation is a 
dramatic reduction in storage demands. Since most revisions tend to differ from the 
previous revision only slightly, the GMA need only store the differences and not the entire 
body of content for every revision. This can he particularly important in environments 
where (here are Sequent but slight changes to large media objects (such as graphics or 
video) or where the archive must be replicated (miirored) to multiple sites where bandwidth 
and/or disk space may be at a premium. 
15 A drawback to using reverse deltas in a GMA implementation is tiiat tiiey can be difficult to 

implement for some media types; especially for complex binary encodings employing 
compression. 



_ 4.6 Generatioii 

20 

A GMA implementation is not required to implement generation, but if generation is 
provided, it must conform to tiie behavior defined in tiiis specification. 
Generation involves the automated creation of data streams which are not maintained 
statically as such in the GMA but are derived in one manner or another from one or more 
easting storage items. This includes conversions from one encoding or format to another, 
extraction of portions of a component's content, auto-generation of indices, tables of 
contents, bibliographies, glossaries, etc. as new components of a media instance, generation 
' of usage, history, and/or dependency reports based on metadata vtilues, generation of 
metadata profiles for use by one or more registry services, etc. 

30 

The present version of this specification only addresses one particular type of generation in 
detail; though it is expected tiiat subsequent versions of the GMA standard will specify 
additional constraints, methods, and guidelines relating to other forms of generation; ■ 
including those mentioned above, as well as others. 

35 

4.6.1 Dynandc Partitioning 

Dynamic partitioning is a special case of generation where a fragment of the data content is 
returned in place of the entire 'data' item, possibly widi automatically generated hypertext 
^ links to preceding and succeeding content, and/or infonnation about the structural 

(contextual) qualities of die omitted content, depending on the media encoding. 

Dynamic partitioning can be implemented and used whether or not static fragments exist. 
Typically, static fragments are created according to the most common usag^, whereas 
dynamic partitioning is relied upon for more specialized applications. 

45 

Dynamic partitioning is controlled by two metadata properties, in addition to those defining 
the identity of die source data item: 'size* and (optionally) 'pointer'. The single determining 
factor for a partition of data is the maximum number of bytes which the fragment can 
conUin. The point within the data item from which the fragment is extracted can be 
so specified by an optional 'pointer' property value (if the encoding supports it). 
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The GMA then extracts the requested fragment, starting either at the beginning of the data 
item or at the point specified by the pointer value, and collecting the largest coherent and 
meaningful sequence of content up to but not exceeding the specified number of content 
* bytes. What constitutes a coherent and meaningful sequence will depend on the media 

encoding of the data and possibly intorprctations inherent in the GMA implementation 
itself. 

Any fragment of a data item must employ the same media encoding as the data item and be 
10 a valid data stream according to the rules and constraints of that encoding. 

4.7 Events 

A GMA implementation is not required to implement event handling, but if event handling 
IS is provided, it must conform to the behavior defined in this spedfication. 

The event handling functionality defined for a GMA is very ^ple, owing to the generic 
and abstract model defined by MARS metadata. 

For each storage item, media component, media instance, or media object, a set of one or 
20 more MARS property sets defijiing some operation(s) can be associated with each MARS 

action, such that when that acdon is successfully performed on that item, component, 
instance, or object, &e associated operations are executed. Automated operations are ihtis 
defined for the source data and not for any target data which might be automatically 
generated as a result of an event triggered operadon. 

Each operation property set must specify die necessary metadata properties to be executed 
con^cfly, such as the action(s) to perform and posably including the CGI URL of the 
agency which is to perform the action. The GMA is free to employ customized mechanisms 
for determining how a given operadon is to be performed, and by which software 
component or agent, if otherwise unspecified in die proper^ set uang standard MARS and 
Metia Framework convendons. 

In the case of a remove acdon, which will resvat in the removal of any events defined at the 
same scope as the removed data, the GMA is still required to execute any operations 
associated with the remove action defined at that scope, after successful removail of the data, 
55 even though die operations themselves arc part of the data removed and will never be 

executed again in that context. 

The most common type of operadon for events is a compound 'gcnemte store' action which 
generates a new target item fi-om an input item and stores it persistentiy in the GMA, taking 
into account all versioning and access controls in force. This is useful for automatically 
updating components such as the toe (Table of Contents) or index when a data component is 
modified, or for generating static firagpnnents of an updated data component. 

A GMA is free to associate automated operations globally for any given action, such that 
- the operations are applied within the scope of the data being acted upon. A GMA is also 
45 free to assodate automated operations with triggers other than MARS actions, such as 

rcoccurring times or days of the week, for the purpose of removing expired data such as via 
a locate remove* compound action, where the locate query defines the expiration based on a 
comparison of the cunrent date with the end^pov or modified properties. A GMA, however, 
may only define automated operations in terms of MARS property sets. 
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S Actions 

The following sections provide pseudocode for the core GMA operations corresponding to 
^ Metia Framework agent actions. 

Note Chat the pseudocode is intended to be illustrative and informal, and not a rigorous 
specification of any particular implementation. 

For every action, the significant metadata properties are identified. Properties which are 
10 highlighted in italics will be assigned default values as specified in MARS if not otherwise 

defined Underlined properties may be opdonal in certain circumstances, depending on the 
functional units implemented or active for die GMA 

Retrieval of metadata for a given media component scope includes all inherited metadata 
from media object and media instance scopes. 

S.l qualify 

Verify that a particular storage item (possibly quallHed for revision or fragment) exists (has 
an identity) in the archive; or, if read access control is acdve, that the item exists and the 
^ uso- has read access for ttie item. The storage item may have zero content bytes. If read 

access control is active, if the user does not have read access to the item, yet it exists, the 
action will nevertheless return false'. This is a security feature to prevent unauthorized users 
• from detennlning wluch storage items exist, even if they cannot access them. 

2s Synonyms: 

Verify, Check, Exists 

Properties: 

identifier, release^ language^ coverage^ encoding, component^ item, u^ei;, access. 
30 revisiQn. fragment 

Pseudocode: 

Boolean qualify (KARS item) 
( 

Retrieve NHN £rom MARS item; 
Reaolve MRN to archive location for item.- 
if (item exists in archive) 
{ 

if (Versioning and inputs item property is equal to 'data*) 
( 

Retrieve metadata for component; 

Retrieve value of revision property from component (netadat;a; 
if (component revision not equal to input revision) 

( 

i£ (Intmc revision cannot be retrieved or regenerated) 
( 

Return 'false' ; 

) 

) 

if (input fragment value specified) 
I 
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if (fragment cannot be retrieved or regenerated) 
{ 

return * false*; 

1 

} 

} 

if (Read Access Control) 

( 

Retrieve metadata for component; 
Retrieve value of access property from component metadata; 
if (KOT (user has write access OR has reaid access)) 
{ 

,5 Return * false*; 

} 

1 

Return 'true' ; 

} 

20 else 

{ 

If (AutoGeneration 

AND the item can be generated from 

one or more other source itema in the archive) 

25 { 

for each source item 
{ 

if (self .qualify (Bource^item) equal to «true«) 
{ 

30 Return 'true'; 

} 

) 

} 

1 

^ Return » false »; 



Comments: 

Mapping the MARS property set to a MRN ensures that an actual storage item is 
specified* and if any Identity properties were omitted in the input MARS property set, 
the default values are applied. It also frees the GMA implementadon from tracking any 
changes in d^ult values specified by the MARS standard. 

45 

S.2 retrieve 

Synonyms: 

Read, Open, Check Out 

50 

Properties: 

identifier, release, language^ coverage, encoding, component, item, HSgr, access, 
revision , fragment , pointer 
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Pseudocode: 

DataStream retrieve (MARS item) 
{ 

if (self .qualify(item) equal to 'false') 
{ 

Report error aztd Abort; 

} 

Retrieve MRN from MARS item; 
Resolve HRH to archive location for item; 
if (item does not exist in archive) 
{ 

IS Determine best source item for requested target item; 

Return self .generate (source item^ item); 

} 

i£ < input item property is equal to 'data') 
{ 

if (Versioning) 
{ 

Retrieve metadata for con^nent; 

Retrieve value of revision property from coa^onent metadata; 
25 if (component revision not equal to input revision) 

{ 

Set target revision to input revision; 

} 

else 

30 ( 

Set target revision to current component revision; 

} 

if (input fragment value specified) 

{ 

Retrieve or regenerate fragment for target revision; 

) 

elsif (input pointer specified 

and pointer is single ID reference) 

{ 

Retrieve idmap for component for target revision; 

Resolve pointer to fragment number; 

if (pointer resolves to fragment nwnber) 

Retrieve or regenerate fragment for target revision; 

} 

else 

{ 

50 Retrieve or regenerate data icem for target revision; 

) 

) 

else 

( 

Retrieve or regenerate data item for target revision; 
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Return data item or fragment for revision as DataStream; 

) 

else 

if (input fragment value specified) 

^ Retrieve or regenerate specified fragment for data item; 

llsif (input pointer specified and pointer is «ID reference) 
{ 

Retrieve idmap for component* 

Resolve pointer to fragment number j ^ 
if (pointer resolves to fragment number) 
{ 

Retrieve or regenerate fragment; 

) 

else 
{ 

Retrieve data item? 

1 

} 

25 else 

I 

Retrieve data item; 

} 

Return data item or fragment as DataStream; 

30 ) 
} 

Return input specified item as DataStream; 

} 
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Comineats: 

Verification of read access and existence of particular revision or fragment of a data 
item is handled by the qualif yO action, so the rctrieveQ action need not recheck these. 

5,3 store 

Synonynos: 

Write, Save, Check In 
Properties: 

identifier, release, language, coverage, encoding, component, item, user, accggs , 
revision , fragment created , modified , owner , creator , modifier, contributor , commept 

Pseudocode: 

store (MARS item. DataStream input) 
{ 
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Retrieve MRN £roin MARS input ; 

if (lock item does not exist tor component) 

( 

self.lockdtem) ; // user must: liave vnrice permission Co succeed 

} 

Retrieve metadata for component; 
if (input item property is equal to <data*) 
10 { 

if (data item exists) 
{ 

if (Versioning) 

{ 

if (input data item identical to current data item) 

I 

Notify user that revisions axe identical; 
self .unlock (item) ; 
so Exit; 

> 

Set comment in component metadata to input comment; 
Store component metadata to meta item for conqponent; . 
Move current data item under current revision; 
25 Move current meta item under current revision; 

if (Static Fragments) 
{ 

Move current idmap item under current revision; 
Move cxxrrent fragments under current rev. (optional) ; 

} 

Increment revision number in component metadata; 

} 

Retrieve owner from component metadata; 
35 Retrieve contributor from component metadata; 

if (owner not equal to user and user not in contributor) 

{ 

Add input user to contributor in component metadata; 

} 

} 

else 
( 

if (Versioning) 

{ 

' Set revision in component metadata to ' 1 * ; 

•} 

Set creator in component metadata to input user; 
Set owner in component metadata to input user; 
Set created in component metadata to current time; 

) 

Set modifier in component metadata to input user; 
Set modified in component metadata to current time; 
55 Set size in component metadata to bytes in input item; 

Store component metadata to meta item for component; 
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Store input DataStream to input specified item; 
self .unlock (item) ; 

} 

Comments: 



10 



When storing a data item, the revision cannot be specified. The GMA must begin all 
revision sequences ftom 'l' and increment each subsequent levision linearly. 



5.4 remove 

Remove one or more storage items defined for a given scope, including any events 
IS associated with any actions at the specified sc<^e. 

Synoayins: 
Delete 
Properties: 

identifier, release, language, coverage, encoding, component, item, user, access 
Pseudocode: 

25 remove (MARS property^set) 

if (identifier property not defined) 
{ 

Report error and Abort; 

30 j 

MARSn items « self .locate (proper ty„set) 

foreach item in items [] 

{ 

Retrieve MRN from MARS item; 

if item « 'data* // only check each component once, by data item 
{ 

Retrieve metadata for component; 
if (trrite Access Control) 

Retrieve value of access property from component metadata; 
if (user does not have write access) 
( 

Report error and Abort; 

1 

if {lock item exists for component) 

^ Retrieve value of user property from component metadata; 
50 it (input user not equal to component user) 

^ Report error and Abort; // not: lock ovmer 
) 
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) 



) 

) 

) 

foreach item in items [] 
{ 

Retrieve MRN from MARS item; 

if (lock item does not exist for con^nent) 

{ 

self .lock (item) ; 

.1 

Delete data stream aesociated with item from system; 
eelf. unlock (item) ; 

1 



Comments: 

The input MARS property set to the retrieve action must define a media object, media 
instance, media component, or storage item* 

Any user who has write permission for a component can remove that component 

Any user who has write permission for all components of a media instance can remove 

that media instance. 

Any user who has write permission for all immediate components and all instances of a 
media object can remove that media object 

The removal of any component, instance, or object includes die removal of all storage 
items and associated events within or belonging to that scope. 

Any events associated with the remove action which are valid for the scope of removed 
data must be executed even though the specificadons of those actions are removed 
along widi the other stored data. 



^ S.S locate 



Given a set of Identity properties, produce a listing of zero or more storage items which 
match an specified properties; and if read access control is used, only include those items 
for which the user has read access. 



40 



Synonyms: 

Fmd, Search, List 
Properties: 

45 identifier, release, language, coverage, encoding, component, item, usey, access 

Pseudocode: 



50 



55 



MARSt) locate (MARS query) 

Remove and save 'user' property value from query, if defined; 
MARSO items « All storage items matching the MARS query; 
if (Read Access Control) 



135 



EPO0 1 244032 [http:// www.getthepatent.co m/Login .dog/$exam.support/Fetch/EP001 244032.cpc7firDmCache-1 part:=maintoo lbar=bottoml Page 136 16 



EP1 244 032A1 



10 



I 

foreach item in items () 

Set user property in item to input user property value; 
if (self .qualify (item) equal to •false*) 

Remove item from items (1 ; // no read permission 

} 

1 

Return items C3; // possibly an empty list 



30 



Comments: 

The MARS property sets for each returned item are only required to contain values for 
Identity properties, i.e, identilRer. release, language, coverage, encoding, component, 
20 and item. Any other included properties are optional and infonnative only. Applications 

may not rely on any non-Identity properties being returned by any GMA. 

MARS property sets which do not fully identify a unique storage item may NOT be 
returned in the result list; i-e, every Identity property must have an explicit value 
defined. Default implicit values should not be applicable to any property set letumed by 
the locate action. 

S.6 lock 

Lock a particular component in the archive. If write access control is used and the 
component akeady exists, the user is required to have vmte access for tiie component. Fails 
if a lock already exists for the component. 

Synonyms: 

^ Check out. 

Properties: 

identifier, release, language, coverage, encoding, component, user, access, locked 
Pseudocode: 

40 

lock (MARS component) 
{ 

if (lock item exists for component) 
Report error and Abort; 

) 

Retrieve metadata for component; 
if (Write Access Control) 

50 t ^ J 

Retrieve value of access property from component metaaata; 

if (user does not have write access) 
( 

Report error and Abort; 
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1 

) 

Create lock item for component; 

set user property in component metadata to input user; 
Store component metadata to meta item for coo^nent; 



5,7 unlock 

Remove the lock on a given component The user must be the owner of the lock, defined by 
IS the user property in the component metadata. Fails if no lock exists. 

Synonyms: 

Check in, Release 
20 Properties: 

identifier, release, language, coverage, encoding, component, user 

Pseudocode: 

2s unlock (MARS cotnponent) 

{ 

if (lock item does not exist for component} 
{ 

Report error and Abort; 

30 ) 

Retrieve metadata for component; 

Retrieve value of user property from cotrponent metadata; 
if (input user not eqtial to component user) 

{ 

35 Report error and Abort; // not lock owner 

) 

Remove user property from component metadata; 

Store component metadata to meta item for component; 

Remove lock item for component; 

) 
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5.8 generate 

Generate the target item from the source item, if possible, and return it as a data stream. 
Synonyms: 

Transform, Convert, Produce, Extract 
Properties: 

identifier, release, language, coverage, encoding, component, item 
Pseudocode: 
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DataStreara generate (MARS source_itera, MARS target_item) 

5 ^ if (self .qualify (source^item) equal to "false*) 

Report error and Abort; // either no read access or item 

// does not exist in archive.-. 

^° Determine proper generation process from source to target; 

if (generation is not possible) 

{ 

Report error and Abort; 
Generate target from source and return as DataStreara; 

) 

20 Comments: 

The generate action is often used in conjunction with the retrieve action when a given 

item does not exist in tiie archive, such as the dynamic creation of a data fragment or 

converting from one encoding to another. 
25 It's up to the OMA to know how to determine if a given generation is possible, typically 

employing the help of an external agent to resolve and perform the gen^ation (such as 

a conversion agent). 

30 
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6 SerializatioQ and Encoding of Specialized Storage Items 

Several storage items defined by MARS and central to the operation of any GMA must 
conform to particular serialization and encoding requirements insofar as data interchange is 
concerned. Actual internal storage, encoding, and management of these items is up to each 
particular GMA implementation in some cases, but every GMA implementation must accept 
and r«tum the following storage items as defined by this specificadon. 



10 6.1 •meta' Storage Items 

Every Wta' storage item which is presented to a GMA for storage or returned by a GMA 
on letneval must be a valid XML instance conforming to die MARS 2.0 DTD: 

httD://me^^.nQkia.conri/schemas/mars/2.0/dtd/ 

Metadata property values "contained" within 'meta' storage items need not be stored or 
managed internally in the GMA using XML, but every GMA implementation must accept 
and return 'meta' items as valid XML instances. 



^ 6.2 'data' Storage Items Trithin 'meta' Media Components 

The $amt DTD defining the swializadon of 'meta* storage items is also used to encode all 
•data* storage items for all 'meta* components. Aldiough a GMA must persistendy store all 
•data* storage items literally, it may also choose to parse and extract a copy of the metadata 
property values de&ied within meta component data items to more efficiently determine 
25 inherited mfftadat^ properties at specific scopes within the archive. 



6,3 'idmap* Storage Items 

Every Idmap* storage item which is presented to a GMA for storage or returned by a GMA 
30 on retrieval must be encoded as a CSV (comma s^aiated value) data stream defining a 

table with two columns wha*e each row is a single mapping and where the first 
column/field contains the value of the 'pointer* property defining the symbolic reference and 
the second column/field contains the value of the 'fiagment* property specifying the data 
coment fragment contauiing the target of the reference. E.g.: 

35 

#EID284828,22a 
#EXD192,12 
«EI09928.3281 
40 #EID727,340 



The mapping information "contained" within "idmap* storage items need not be stored or 
managed iniemaUy in the GMA in CSV fonnat, but every GMA implcmeniadon must 
accept and return •idmap' items as CSV formatted data streams. 



6.4 'data' Storage Items for a specific Revision 

50 

The GMA must return the complete and valid contents of a given 'data' storage item for a 
specified revision Qf it exists), regardless how previous revisions are managed internally. 
Reverse deltas or otfier change summary information which must be applied in som.e 
fashion to regenerate or lebutld the desired revision must never be returned by a GMA, even 
55 if that is all that is stored for each revision data item internally. Only the complete data item 

is to be returned. 
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1 Scope 

This document defines tiie Registry Service Architecture (REGS), a generic architecture for 
5 dynamic query r&solution agencies based on the Metia Framework and Media Attribution 

and Reference Semantics (MARS), providing a unified interface model for a broad range of 
search and retrieval cools. 

The REGS aichicectuic is a component of die Metia Framework for Electronic Media. A 
basic understanding of the Metia Framework and MARS is presumed by this q;)edficadon. 

2 Overview 

REGS provides a generic means to interact widi any number of specialized search and 
retrieval tools using a common set of protocols and interfaces based on die Metia 
. Framework; namely MARS metadata semantics and cither a POSIX or CXjI compUant 
interface. As with odicr Metia Framework components, this allows for much greater 
flexibility in the implementation and evolution of particular solutions while minimizing the 
inteidcpendendes between die tools and their users Qiuman or otherwise). 
2o Being based on MARS metadata allows for a high degree of automation and ti^t 

synchronization with the archival and management systems used in the same environment, 
with each registry service deriving its own registry database directly from the metadata 
stored in and maintained by the various archives themselves; while at the same time, each 
registry sendee is insulated from the implementation details of and dianges in die archives 
from which it receives its information. 

Every registry service shares a common architecture and fundamental behavior, differing 
pximaiily only in the actual metadata properties requited for their particular application. 
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3 Related Documents, Standards, and Specifications 

3.1 Metia Framework for Hectronic Media 

The Meda Frameworic is a generalized metadata driven framework for the management and 
distribution of electronic media which defines a set of standard, open and portable models 
interfaces, and protocols facilitating Ae construction of tooU and environments op&mizcd 
for the management, referencing, distribution, storage, and retricvia of electronic media.; as 
well as a set of core software components (agents) providing functions and services relating 
to archival, versioning. access control, search, retrieval, conversion, navigation, and 
metadata management. 
|ittp!//metia.nQlda.conVsv ^^ifq't}Qns/ffl>Mgetia 

3.2 Media Attribution and Reference Semantics (MARS) 

Media Attribution and Reference Semantics (MARS), a component of tiie Metia 
Framework, is a metadata specification framework and core standard vocabulary and 
semantics faciUtating the portable management, referencing, distnbution. storage and 
retrieval of dectromc media. 
l^tt p-7/metia.T^ 9Vta.oQm/SDftqfiiFa^Qns/ia>itARS 

3.3 Generalized Media Archive (6MA) 

The Generalized Media Archive (GMA). a component of the Metia Framework, is an 
abstract archival model for the storage and management of data based solely on Media 
Attribution and Reference Semantics (MARS) metadata; providmg a uniform, consistent, 
and implementation independent model for information storage and retncval, versiorang, 
30 and access control. 

htto-7/inetia.no Vifl-CQm/soecifications/#GMA 



10 



IS 



20 



25 



35 



40 



45 



50 



55 



142 



EQ001g440 32 [http:/AftAftw.getthepatent.(X>m /Login.dog/$exam.support/Fe tch/EP001244^ 



Page 143 of 161 



EP 1 244 032 A1 



10 



15 



20 



4 Key Terms and Concepts 

4.1 Property 

A property, as defined by the MARS specification, is a quality or attribute which can be 
assigned or related to an identifiable body of information, and is defined as an ordered 
collection of one or more values sharing a common name. The name of the collection 
represents the name of the property and die valueCs) represent the realization of that 
property. Typically, constraints are placed on the values which may serve as the realization 
of a g^ven property. 

4.2 Property Set 

A property set is any set of valid MARS metadata properties, 

43 Profflc 

A profile is a property set which, in addition to any non-idendty related properties. 
expliciUy defines the identity of a specific media object, media instance, media component, 
or storage item (possibly a qualified data item). 

Default values for unspecified Identity properties are not applied to a profile and any gjven 
profile may not have scope gaps, in the defined Jdcn4ty Jjopertics (i-e, litem' defined but not 
25 •component', etc.). Profiles must unambiguously and precisely identify a media object, 

instance, com^nent or Item. 

In addition to identity, the retrieval location of the archive or other r^sitory where that 
information resides must be spedfied eidier using the location* or 'agency' properties. If 
both are specified, they must define the equivalent location. 

The additional properties included in any given profile are defined by the registry service 
operating on or returning die profile, and may not necessarily contain any additional 
properties other than those defining identity and location. 

4.4 Query 

A query is a special kind of property set which defines a set of property values which are to 
be compared to the equivalent properties in one or more profiles. A query differs from a 
regular property set in that it is allowed to contain values which may deviate from the 
MARS specification in the following ways: 
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4.4.1 Multiple Values 

Properties normally allowing only a single value may have multiple values defined in a 
query. 

The normal interpretation of multiple query values is to apply 'OR* logic such that tiie 
property matches if any of the query values match any of the target values; however, a 
given registry service is permitted, depending on the application, to apply 'AND* logic 
requiring that alt query values match a targpt value, and optionally that every target value i 
matched by a query value. 
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It must be clearly specified for aregistry service if *AND' logic is being applied to multiple 
query value sets. 

4.4.2 Regular ExpresAons 
Query values for properties of MARS type String may contain valid POSK regular 
expressions rather than literal strings; in which case the property matches if the specified 
regular expression pattern matches the target value. 

4.4.3 Comparison Operators 
Query values may be prefixed by one of several comparison operators, with one or more 
mandatory intervening space characters between the operator and the query value. 

The order of comparison for binary operators is: 

query value {operator} target value 
Not all comparison operators are necessarily meaningful for all property value types, nor 
are aU operators required to be supported by any gjven registry service. 
It must be clearly specified for every registry service which, if any, comparison operators 
are supported in input queries. 

In the rare case that a Hteral string value begins with a compan«>n operator followed by one 
or more intervening spaces, the initial operator character diould be preceded by a backslash 
character- V. -The registry service must-then identify and remove &e backslash -character 
prior to any comparisons, 

4.4.3.1 Negation "1" 
The property matches if the query value fails to match the target value. 

E.g. " I approved'*. 

4.4.3.2 Less Than "<'\ 
The property matches if the query value is less than the target value, 

E.g."< 2.5". 

4.4.3.3 Greater Than 

The property matches if the query value is greattsr than the target value. 

E.g."> draft:". 
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4 A3 A Less Than or Equal To "<«" 

The property matches if the query value is less than or equal to the target value. 

E.g. "c= 2000-09-22". 

4.4.3.5 Greater Than or Equal To ">»" 

The properly matches if the query value is greater than or equal to the target value. 
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B.g.">» 5000". 

4.4.4 Wildcard Value Operator 

Any property in a query may have specified for it the special value regardless of 
property type, v/hich effectively matches any defined value in any target. The wildcard 
vdue does nar however match a property whidi has no value defined for it. 

The wildcard value operator may be preceded by the negation operator. 

This special wildcard operator is particularly useful for specifying the level of Identity 
scoping of the returned profiles for a registry which stores profiles for multiple levels of 
scope (see secdon XXX). It is also used to match properties where all that is of interest is 
that they have some value defined but it doesn't matter what the value actually is. Or, when 
combined with the negadon operator, to match properties which have no value defined. The 
latter is useful for validation and quality assurance processes to isolate information which is 
missing mandatory or critical metadata properties. 

In the rare case fhzt a literal string value equals the wildcard value operator, the wildcard 
value operator must be preceded by a backslash character V. The registry service must then 
identify and remove the backslash character prior to any comparisons. 
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5 General Architecture 

Every re^stry service shares the following common features and qualities with regards to 
its implementation and operation (see diagram below): 



Generalized Media Archives 
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Profile<s) J 

\:f. tfi »w rj» ^ *» 1. 
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MARS metadata profiles are collected from one or more archives, and combined into an 
optimized, specialized database for performing searches, according to the nature of the 
particular registry service. 

The internal organization and operation of the registry service is totally independent 
from and ignoiant of the internal organization and operation of each archive from which 
it rec^ves profiles. 

All registry services implement the MARS locate' action, and only that acdon, which 
must be explicitly spedfied in every input query. 

Users (human or otherwise) submit MARS metadata search queries to the re^stry 
service and receive zero or more MARS metadata profiles matching the search query, 
possibly scored and ordered by relevance. 

• The MARS metadata-based query interface completely hides the internal organization 
and operation of die registry service from the user. 

• The implementation of any registry service can be modified or even replaced entirely by 
a different implementation v/idi no impact to or dependency upon archives or users. 

. New archives can contribute profiles to a registry service with no special knowledge or 
modification by the registry service. 
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5,1 Defining Caiaracteristics of a Registry Service 

A registry service is defined by the following three characteristics: 
* I . the metadata properties it allows and requires in each profile 

2. the metadata properties it allows and requires in a given search query 

3. whether returned profiles are scored and ordered according to relevance 

These three criteria define the mterface by which the registry service interacts with all 
source archives and all users. 

All other criteria are hidden within and totally open to the particular implementation of the 
re^stry service, so long as the implementation conforms to the general behavior and 
operation otherwise defined for all registry services by this specification. 

IS 

5.2 Generation of the Registry Database 

A particular registry service will extract from a given archive (or be provided by or on 
behalf of the archive) the profiles for all targets of interest which a user may search on, and 
20 ^ Hiitiln g all propertfeg defined for each target which are relevant to the particular registry > 

Depending on the nature of tihe re^stry, this may include profiles for both abstract media 
objects, media instances, and media components as well as physical storage items or even 
qualified data items. Some property values for a profile may be dynamically generated 
specifically for the registry, such as the automated identificacion or extraction of keywords 

^5 or index terms ftom the data content, or similar operadons. 

The profiles ftom several archives may be combined by the registry service into a single 
seardi space for a given application or environment The location and/or agency properties 
serve to differentiate tiie source locations of die various archives from which the individual 

^ profiles originate. 

53 Resolution of Search Results 

All registry services define and search over profiles, and those profiles define bodies of 
-5 infbnnation at ei^cr an abstract or physical scope; i.e. media objects, media instances, 

media components, or storage items, A given registry database might contain profiles for 
only a single level of scope or for several levels of scope. 

If a query does not define any Idendty properties, then the registry service must return all 
matching profiles regardless of scope; however, if the query defines one or more Identity 
^ properties, then all profiles returned by the registry service must be of the same level of 

scope as the lowest scoped Identity property defined in the search query. 
Note tiiat a specific level of scope can be specified in a query by using the special wildcard 
- value for the scope of interest (e.g. "component^meta item=* ..." to fmd all storage 
45 items within meta components which otherwise match the remainder of the query). 

Each set of profiles returned for a given search may be optionally scored and ordered by 
relevance, according to how closely they match the input query. The score must be returned 
as a value to the MARS 'relevance' property. The criteria for determining relevance is up to 
each registry service, but it must be defined as a percentage value where zero indicates no 
^ match whatsoever, 100 indicates a '"perfect" match (however that is defined by the registry 
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service), and a value between zero and 100 reflects the closeness of the match 
proportionally. The scale of relevance from zero to 100 is expected to be hnear. 

S.4 Minimum and Maximum Thresholds 

A registry service can be directed by a user, or by implementation, to apply two Qrpes of 
thresholds to constrain the total number of profiles returned by a given search. Both 
thresholds may be zgpUtd together to (he same search results. 

S Al Maxhnum Size 

The MARS 'size' property can be spedfied in the search query (or applied implicitly by the 
registry service) to define the maximum number of profiles to be returned. 
In the case that profiles are scored and ordered by relevance, the maximum number of 
profiles are to be taken from the highest scoring profiles, 



S.4.2 Mnimum Relevance 
^ The MARS 'relevance' property can be specified in the search query (or applied impUcitly 

irtfaVrSstry service) fo dTfme the minimum score which must be equaled or exceeded by 
every profile retarned. 

Note that specifying a minimum relevance of 100 reqxnres that targets match perfectty, 
allowing one to choose between best match and absolute match. 
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SJS Serialization of Input/Ott^ut 

All property sets (including profiles and queries) which are receivedflmported by and 
rettJSe«V«2crted from a registry service via a date stream must be encoded as XML 
inSces cSnfonning to the MARS DTD. This includes sets of profiles extracted from a 
Svra^cSS search queries received from dient appUcations. and sets of profiles retmned 



35 



as the results of a search. 

If multiple property sets are defined in a MARS XML instance provided as a search request, 
then eJch proper^ set is processed as a separate query, and the results of each query 
returned in fte order spedfied. combined in a single XML instance. Any sorting or 
SSion by specified ai«holds is done per each query only. The results from the s^arate 
queries are n<W combined in any fashion other than concatenated into the single returned 
XML instance. 

40 g service is free to organize and manage its internal registry database using 

whatever means is optimal for that particular service. It is not required to utibze or preserve 
any XML encoding of the profiles. 

4s S.S.1 Human User Interface Recommendations 

Most registry services will include an additional CGI or other web based component which 
providefa human-usable interface for specifying queries and accessmg search resulte Th« 
will typically act as a specialized proxy to the general registry service. «>nvemr« the user 
specifi^ metadata to a valid MARS query and then mapping the returned XM- ^stance 

so cTntaining the target profiles to HTML for viewing and selection. Although such an 

interface or proxy component is outside the scope of this specification proper, the following 
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recommendations, if followed, should provide for a certain degree of consistency between 
various human user interfaces to registry services. 

g • The set of profiles should be presented as a sequence of links, preserving any ordering 

-based on relevance scoring. 

• Each profile link should be encoded as an (X}HTML 'a' element within a block element 
or other visually distinct element ('p*, li', 'td*, etc.). 

10 « The URL value of the *href attribute of die *a' element should be constructed from the 

profile, based on the location' and/or 'agency' properties, which will resolve to the 
content of (or access interface for) the target 

• If the 'relevance* property is defined in the profile, its value should begin the content of 
15 the 'a* element, differentiated clearly from subsequent content by punctuation or strucmre 

such as parentheses, comma, colon, separate table column, etc. 

• If the 'tiflc* property is defined in the profile, its value should complete the content of the 
V element Otherwise, a (possibly partial) MRN should be constructed from the profile 

20 dnd complete the content of the 'a* element. 

Examples: 

<hC(Bl> 

<p» 

<A href - "ht Cp : //a^x . eon/GMA? act ionsrecrieveU.dent i£ier« ...«>< 98 ) roo«/a > 
<p> 

<a hre£«"h&tp: //xyz . com/GHA?action«retrievefcideatlCier«. . . «» 1 87) Bar</a> 
30 </p> 
*P> 

<a hre£««http://xy«.co«/GMA7actlon-recrleve&identl£ler«. . .">07) 8afl</a.> 

</p> 

</body> 

<htal> 
<body> 
<table> 
<tr> 

40 <th>5cor«</th> 
< th>Targ<t «/ th> 
</tr> 
<tr> 

<t<r>98</td> 

<td>ca hre£«*»http://xyz.coca/GHA?actlon«recrlevefcideneif ier-. . . ">Foo</a></cd> 
</tr> 
<tr> 

«td>87</td> 

<cd><a tare£>"htcp://xyz.coin/GMA?actioa«recrleve(iidenti£ier«. . •">aar</a></cd» 
</tr> 
<er> 
<td>37</cd> 

ctdxa href a"hccp://xyz.coca/GMX7ac&lon»r«trlevs6id«nclf lex*. . .">Sas</a></td9 
</tr> 
</cable> 
55 </body> 
</htinl> 
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6 Core Registry Services 

The following registry services are defined as sub-components of the Metia Framework. For 
each registry service, a brief description is provided, as well as a specification of wWch 
* metadata properties are required or allowed for profiles and for queries. No discussion is 

provided regarding the scoring and ordering of search results by relevance. Each registry 
service is free to provide soch functionality as needed and in a fashion optimal to the nature 
of the particular registry service. 

The 'action' property is required to be specified with the value locate* in all registry sendee 
queries, therefore it is not included in the required query property specifications for each 
registry service. Likewise, the "relevance' and 'size? properties arc allowed for all input 
queries to all registry services, therefore they are also not explidtly listed in the allowed 
query property specifications for each registry service. 

15 

6.1 Metadata Registry Service (META-REGS) 

META-KEGS provides for searching the complete metadata property sets (including 
inherited values) for all identifiable bodies of information, concrete or abstract; including 
media objects, media instances, media components, storage items and qualified data items. 

The results of a search aie a set of profiles defining zero or more targets at ±e lowest level 
of Identity scope for which there is a property defined in the search query. All targets in the • 
results will be of the same level of scope, even if the registry database contains targets at all 
levels of scope. 

25 The wildcard operator can be used to force a particular level of scope in the results. E.g< to 

define media instance scope, only one instance property need be defined with the vdldcard 
operator value (e.g. 'Ianguage=s*'0; to define media component scope, the component 
property can be defined with the wildcard operator value (e.g. "components*"); etc. The 
registry service may not require nor expect that any particular instance property be used, 
nor that only one property be used. It is not permitted for two or more instance properties to 
have both wildcard and negated wildcard operator values in a given input query. 
The default behavior is to provide the best matches for the specified query; however, by 
defining in tfie input query a value of 1(X) for die Relevance' property, the search results will 
only include those targets which match the query perfectly. The former is most useful for 

35 general browsing and exploratiion of the information space and die latter for collecdon and 

extraction of specifically defined data. 

6,1.1 Profile Properties 

^ Required: All Identity properties required to uniquely identify the body of information in 

question, as well as either the location' or 'agenc/ property. 

Allowed; Any valid MARS property, presumably all defined MARS properties 
applicable to the body of information in question. It is recommended that the 
'title' property be defined for all profiles, whenever possible. 



so 



55 



150 



E POO^I 344932 fhttp:/Aww.getthepatent.com/Lo g in.dog/$exam.support/Fetch/EP0012 44 032.c^^ 1 part =maintool b ar =bottom] 



EP 1 244 032 A1 



€.1.2 Query Properties 

Required: No specific properties required. At least one property nitist be specified in the 
^ search query other than the 'action* property. 

Allowed: Any valid MARS property. 

6J2 Content Registry Service (CON-REGS) 

10 CON-REGS provides for seaicbing the textual content of all media instances within the 

included archives. It corresponds to a traditional "fiee-t^t index" such as those employed 
by most web sites. 

The results of a search are a set of profiles defining zero or more data component data 
storage items or qualified data items. 

Profiles are defined only for data storage items and qualified data items (e.g. fragments) 
which belong to flie data component of a media instance. Other components and other items 
bdonging to the data component are net to be included in the search space of a C!OK*R£GS 
registry service. Note that in addition to actual fragment items, profiles for "virtual" 
20 firagmcnts can be defined using a combination of the 'pointer' and (if needed) 'size' 

properties, where appropriate for the media type (e.g. for specific sections of an XML 
document instance). 

For each data item, the bywords' property is defined as the unique, zxdnimal set of index 
terms for the item, typically corresponding to the morphological base forms (linguistic 
25 forms independent of infiection, derivation, or other lexical variadon) exchiding common 

"stop" words such as articles ("the", "a"), conjunctions ("and", "whereas"), or scnMaitically 
weak words ("is**, "said'*), etc. It is expected that the same tools and processes for distilling 
arbitrary input into minimal forms are applied both in the generation of the registry 
database as wdl as for all relevant input query values. 

30 The scope of the results, such as whole data items versus fragments, can be controlled u^ng 

the fragment* property and the wildcard value operator for the scope of interest. E.g., 
"fragments*" will force the search to only return profiles of matching fragments and not of 
whole data items; whereas "fiagment=I*" will only return profiles of matching whole data 
storage items. If otherwise unspecified, all matching profiles for all items will be returned, 

35 which may result in redundant information being idendfied. 

A human user interface v^U likely hide die d^nidon of die 'fragment' property behind a 
more mnemonic selection list <x set of checkboxes, providing a single field of input for the 
query keywords. 

40 If a given value for the "keywords' property contains multiple words separated by white 

space, then all of the words must occur adjacent to one another in the order specified in the 
target content Note that this is not the same as multiple property values where each value 
contains a single word. The set of all property values (string set) constitute an OR set, while 
' the set of words in a single property value (string) constitute a sequence (phrase) in the 

43 target. White space sequences in the query property value can be expected to match any 

white space sequence in the target content, even if those two sequences are not identical (i.e. 
a space can match a newline or tab, etc.). 

A human user interface will have to provide a mechanism For defining multiple 'keywords' 
property values as well as for differentiating between values having a single word and 
^ values containing phrases or other white space delimited sequences of words. In die interest 
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of consistency across registry services, it is recommended that when a single value input 
field is provided for the "keywords* or similar property, wWte space is used to separate 
xnuldple values by default and multi-word values are specially delimited by quotes to 
* indicate that they constitute the same value (e.g, the field [a b *cl c2 c3" dl defines four 

values, the third of which has three words). 

It is permitted for special operators or commands to CON-RBGS to be interspersed within 
the set of "keywords' values, such as those controlling boolean logic, maximal or minimal 
10 adjacency distances, etc. It is up to the registry service to ensure that no ambiguity arises 

between CON-REGS operators and actual values nor between REGS special operators and 
CON-REGS operators. REGS special operators always take precedence over any CON- 
REGS operators. 



15 



6.2.1 Profile Properties 

Required: All Identity and Qualtfier properties required to uniquely identify each data 
storage item or qualified data item in question; either the location* or 'agency* 
property; and the "keywords* property containing a unique, minimal set of index 
20 tenns fbr &e item in question. 

Allowed: All required properties, as well as tiie "title* property (recommended). 

6.2.2 Query Properties 

25 Required: The 'keywords' property containing the set of index terms to search on (may 

need to be distilled into a unique, minimal set of base forms by the registry 
service). 

Allowed: All required properties, as well as the 'fragment* property with either wildcard 
value or negated wildcard value only. 

30 

6.3 'Typological Registry Service (TYPE-RE(JS) 

TYPE-REGS provides for searching the set of 'class' property values (including any 
inherited values) for all media instances according to the typologies defined for the 
information contained in the included archives. 

The resialts of a search are a set of profiles defining zero or more media instances. 

In addition to Ihe literal matching of property values, such as provided by META-REGS, 
TYPE-REGS also matches query values to target values taking into account one or more 
"IS-A'* type hierarchies as defined by the typologies employed such that a target value 
which is an ancestor of a query value also matches (e.g. a query value of "dog" would be 
expected to match a target value of ''animal**). If only exact matching is required (such that 
-e.g. "dog" only matches "dog") then META-REGS should be used. 

45 TYPE-REGS does not differentiate between classification values which belong to different 

typologies nor for any ambiguity which may arise from a single value being associated with 
multiple typologies with possibly differing semantics. It is only responsible for efficiently 
locating all media instances which have defined values matching those in the input query. If 
conflicts arise from the use of multiple typologies within the same environment, it is 
reconunended that separate registry databases be generated and referenced for each 
individual typology. 
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63.1 Profile Properties 

Required: The Identity properties which explicitly and completely define the media 
5 instance, one or more values defined for the 'class* property, as well as cither 

the location* or 'agency* property. 

Allowed: All required properties, as well as the *tifle' property (recommended). 

10 63 Query Properties 

Required: The 'class' property containing the set of classifications to search on. 
Allowed: Only die 'class' property is allowed in search queries. 

6-4 Dependency Registry Service (DEP-REGS) 

DEP-REGS provides for searching the set of Association property values (including any 
inherited values) which can be represented explidtly using MARS Identity semantics for all 
bodies of information in the included archives. 

The results of a search are a set of profiles defming zero or more targets matching the 
search query. 

DEP-REGS is used to identify relationships between bodies of information within a g^ven 
environment such as a document which serves as &e basis for a translation to another 
language or a conversion to an alternate encoding, a high level diagram which summarizes 
the basic characteristics of a much more detailed low level diagram or set of diagrams, a 
reusable documentation component which serves as partial content for a higher level 
component, etc. The ability to determine such relationships, many of which may be implicit 
in the data in question, is crucial for managing large bodies of information where changes to 
one media instance may impact the validity or quality of other instances. 

For example, to locate all targets which immediately include a given instance in their 
content, one would construct a query containing the 'includes* propwty with a value 
conasting of a URI identifying the instance, such as an MRN. DEP-REGS would then 
return profiles for all targets which include that instaiice as a value of their ^includes* 
33 property. Similarly, to locate all targets which contain referential lixUcs to a given instance, 

one would construct a query containing the Vefers' property with a value identifying &e 
instance. 

DEP-REGS can be seen as a specialized form of META-REGS, based only on the minimal 
set of Identity and Association properties. Furthermore, in contrast to the literal matcWng of 
property values such as performed by META-REGS, DEP-REGS matches Association 
query values to target values by applying on-the^fly mapping between all equivalent URI 
values when making comparisons; such as between an MRN and an Agency CGI URL, or 
"between two non-string-identical Agency CXjI URLs, which both define the same resource 
(regardless of location). Note that if the META-REGS implementation provides such 
equivalence mapping of URI values, then a separate DEP-REGS implementation is not 
absolutely required; though one may be still employed on the basis of efficiency, given the 
highly reduced number of properties in a DEP-REGS profile. 
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6.4.1 Profile Properaes 
Required! The Identity properties which explicitly and completely define the body of 

information, all defined Association properties, as well as either the location* 
or 'agenc/ property. 

Allowed: All required properties, as well as the 'titte* property (reconuncnded). 

6.4.2 Query Properties 
Required: One or more Association properties. 
Allowed: One or more Association properties. 

6-5 Process Registry Service (PRO-REGS) 

PRO-REGS provides for searching over sequences of state or event identifiers (state ch^ns) 
which are associated with specific components of or locations within procedural 
documentation or other forms of temporal information. 

The results of a search are a set of profiles defining zero or more targets matching the 
search query. 

PRO-REGS can be used for, among other things, "process sensitive help" where a unique 
identifier is assodated with each significant point in iKX)cedures or operations defined by 
procedural documentation, and software which is monitoring, guiding, and/or managing tiie 
procedure keeps a record of the procedural states activated or executed by the user. At any 
time» tiie running history of executed states can be passed to PRO-REGS as a query to 
locate documentation which most dosdy matches that sequence of states or events, up to 
the point of the current state, so that the user receives predse information about how to 
proceed with the given procedure or operation exacfly from where they are. The procedural 
documentation would presumably be encoded using some form of functional markup (e.g. 
SGML, XML, HTML) and generation of the profiles identifying patiis to states or stqps in 
the procedural documentation would be automatically generated based on analysis of tiie 
data content, recursively extracting the paths of spedal state identifiers embedded in the 
markup and producing a profile identif>dng a qualified data item to each particular point in 
the documentaJdon using the ^pointer* property. 

6.5.1 Profile Properties 

Required: The Identity properties which explidtly and completely define the body of 
information, the 'class* property defining the sequence of state identifiers up to 
the information in question, as well as eitiier the location' or 'agency' property. 

Allowed: All required properties, as well as the 'tide' property (reconunended). 

4s 6.S.2 Query Properties 

Required: The 'class' property defming a sequence of State identifiers based on user 

navigation history. 
Allowed: Only the 'class' property is allowed in search queries. 
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Claims 

55 1 . A query resolution system comprising one or more archives containing a plurality of persistent data entitles, each 
entity including metadata In the form of a group of properties having property values assignable thereto, at least 
some of those properties providing a definition of a predetermined level of scope such that within a set of related 
data entitles, the scope of an entity at a higher level encompasses the scope of related entitles at a lower level of 
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scope, a registry database operable to extract from said one or more archives those data entities having prede- 
termined properties Including said definition of a predetemnined level of scope and a query resolution engine op- 
erable In response to a request from a query Interface to Identify extracted entities whose property values fulfil the 
request. 

5 

2. A system as claimed in Claim 1 , including an encoder operable to ensure said registry database is accessed 
utilising a common format. 

3. A system as claimed in Claim 1 or Claim 2, Including a web based interface operable to map between a first user 
10 Input fonmat and a said query Interface format. 

4. A system as claimed In any preceding Claim, wherein said query resolution engine Is operable to provide an 
Indication of the relevance of extracted entities In relation to said request. 

IS 5. A query resolution service for use in an object-oriented programming environment including one or more archives 
containing a plurality of persistent data entities, each entity including metadata in the fomn of a group of properties 
having property values assignable thereto, at least some of those properties providing a definition of a predeter- 
mined level of scope such that within a set of related data entities, the scope of an entity at a higher level encom- 
passes the scope of related entitles at a lower level of scope, the service comprising extracting from said one or 

so more archives those entities having predetermined properties Including said definition of a predetemilned level of 

scope and Identifying, In response to a request, those extracted entities whose property values fulfil said request. 

6. A service as dalmed In Claim 5, wherein an Indication of relevance to said properties set out In said request is 
generated for each identified entity. 

25 

7. A computer program comprising executable code for execution in an object-oriented programming environment, 
wherein the environment is operable in accordance with said code to provide the service according Claims 5 or 
Claim 6. 

30 8. A program as claimed in Claim 7, stored in a computer readable medium. 

9. A program as claimed in Claim 7 or Claim 8 wherein the environment comprises one or more computational devices. 

10. A program as claimed In any one of Claims 7 to 9, in which the computational devices are networked. 

35 

11. A registration database for connection to one or more archives containing a plurality of persistent data entities, 
each entity Including metadata in the fomn of a group of properties having property values assignable thereto, at 
least some of those properties providing a definition of a predetermined level of scope such that within a set of 
related data entities, the scope of an entity at a higher level encompasses the scope of related entitles at a lower 

40 level of scope, the database being operable to extract from said one or more archives those data entities having 

predetennlned properties including said definition of a predetennlned level of scope. 

12. A registry jJatabase as claimed in Claim 11 , including a query resolution engine operable in response to a request 
from a query Interface to Identify extracted entitles whose property values fulfil the request. 

45 

13. A temninai for connection to a registration database, said database being connected to one or more archives 
containing a plurality of persistent data entities, each entity Including metadata in the form of a group of properties 
having property values assignable thereto, at least some of those properties providing a definition of a predeter- 
mined level of scope such that within a set of related data entities, the scope of ah entity at a higher level encom- 

so passes the scope of related entitles at a lower level of scope, the database being operable to extract from said 

one or more archives those data entities having predetemiined properties Including said definition of a predeter- 
mined level of scope, the terminal being operable in response to user Input to generate a request to identify ex> 
tracted entities whose property values are defined in said input. 

ss 14. A terminal as claimed in Claim 13. wherein said terminal is operable to display said identified entitles to said user. 
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